Kung-Hsiang Steeve Huang (@steeve__huang) Twitter Tweets • TwiCopy

David Hendrickson

6 months ago

Now that we are fully immersed in the year of Agentic AI, nobody has created a benchmark to evaluate how well these agents work together...until now. 👇 @Salesforce's CRMArena-Pro builds upon CRMArena with nineteen expert-validated tasks across sales, service, and 'configure,

thumb_up_off_alt7

chat_bubble_outline0

repeat4

shareShare

Salesforce AI Research

@sfresearch

6 months ago

CRMArena-Pro reveals why enterprise AI deployment remains challenging—many top-performing agents struggle significantly on real-world business tasks. 👇Full technical breakdown from our research lead Kung-Hsiang Steeve Huang below. #EnterpriseAI #AgenticAI #EGI

thumb_up_off_alt6

chat_bubble_outline0

repeat3

shareShare

Caiming Xiong

@caimingxiong

6 months ago

AI agents are rapidly integrating into various industries, however their full potential remains underutilized due to performance inconsistencies and enterprise hesitation. To alleviate this issues, we introduce <CRMArena-Pro>, a novel enterprise-agent benchmark for holistic and

thumb_up_off_alt61

chat_bubble_outline2

repeat18

shareShare

Shizhe Diao

@shizhediao

6 months ago

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering

thumb_up_off_alt382

chat_bubble_outline17

repeat64

shareShare

Bony Bean

@bonybean

5 months ago

Salesforce AI Introduces CRMArena-Pro: The First Multi-Turn and Enterprise-Grade Benchmark for LLM Agents: ift.tt/zs73bwj

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Vlad Ruso PhD

@vlruso

5 months ago

Salesforce AI Launches CRMArena-Pro: A Game-Changer for Evaluating LLM Agents in Business #CRMArenaPro #LLMAgents #SalesforceAI #CustomerExperience #DataPrivacy itinai.com/salesforce-ai-… Understanding CRMArena-Pro: A New Benchmark for LLM Agents Salesforce AI has introduced CRM…

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Quantumbytz

@quantumbytz

5 months ago

Salesforce AI Introduces CRMArena-Pro: The First Multi-Turn and Enterprise-Grade Benchmark for LLM Agents #AI #MachineLearning #IoT #LLM marktechpost.com/2025/06/05/sal…...

thumb_up_off_alt3

chat_bubble_outline0

repeat2

shareShare

Kung-Hsiang Steeve Huang

@steeve__huang

5 months ago

Thanks Marktechpost AI Research News ⚡ for covering CRMArena-Pro 🙏 Our new benchmark reveals that even the best LLM agents achieve only ~58% success rate on realistic business tasks, dropping to 35% in multi-turn scenarios. Also, confidentiality awareness is nearly non-existent across all models

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Silvio Savarese

@silviocinguetta

5 months ago

Synthesized data for #EnterpriseAI evaluation is an ethical imperative. CRMArena-Pro lets us rigorously test agents in a real-life business environment—a messy, multi-step, complex world—without putting sensitive data at risk. Proud of the team's work toward safer, more

thumb_up_off_alt13

chat_bubble_outline2

repeat5

shareShare

Hou Pong (Ken) Chan

@kenchanhp

5 months ago

🚀 Discover how LLMs perceive their knowledge boundaries across languages in our #ACL2025 main paper! 🌍 By probing LLMs’ internal representations, we reveal key insights on where knowledge boundaries are encoded & propose a training-free method to combat cross-lingual

thumb_up_off_alt24

chat_bubble_outline0

repeat9

shareShare

Chomba Bupe

@chombabupe

5 months ago

Another paper drop, this time from Salesforce: "These results underscore a significant gap between current LLM capabilities and real-world enterprise demands, highlighting needs for improved multi-turn reasoning, confidentiality adherence, and versatile skill acquisition."

thumb_up_off_alt517

chat_bubble_outline16

repeat105

shareShare

Hou Pong (Ken) Chan

@kenchanhp

5 months ago

🚀We are thrilled to launch 'Lingshu' – A Generalist Medical Multi-modal Foundation Model! 🩻 🌟 Highlights of Lingshu: ⚕️ Unified knowledge across 12+ imaging modalities (X-Ray, CT, MRI & more!). 🧠 Enhanced reasoning & reduced hallucinations via novel data curation and

thumb_up_off_alt30

chat_bubble_outline5

repeat9

shareShare

Pranav Venkit, PhD

@pranavvenkit

5 months ago

Im really excited to be presenting this work in Greece! 🏛️ As generative text models start reshaping how we search for information, understanding their societal impact is more important than ever. 🔎 If you’ll be at #ACMFAccT2025, let’s grab a coffee and chat! ☕️

thumb_up_off_alt22

chat_bubble_outline0

repeat3

shareShare

Salesforce AI Research

@sfresearch

5 months ago

1/10🎉New paper on AI Agent and LLM judge safety "Helpful Agent Meets Deceptive Judge: Understanding Vulnerabilities in Agentic Workflows" As AI agents become increasingly autonomous, they often rely on feedback from judges (evaluators). These judges evaluate, critique, and

thumb_up_off_alt24

chat_bubble_outline1

repeat5

shareShare

Yangyi Chen (on job market)

@yangyichen6666

5 months ago

🚀 I'm looking for full-time research scientist jobs on foundation models! I study pre-training and post-training of foundation models, and LLM-based coding agents. The figure highlights my research/publications. Please DM me if there is any good fit! Highly appreciated!

thumb_up_off_alt128

chat_bubble_outline6

repeat22

shareShare

Dr. Theophano Mitsa ☦️🇬🇷🇺🇸

@theomitsa

5 months ago

arxiv.org/abs/2505.18878 Salesforce tried LLMs in Real Business Scenarios and Found Disappointing Performance Even from the Best

thumb_up_off_alt10

chat_bubble_outline0

repeat2

shareShare

elvis

@omarsar0

5 months ago

Andrej Karpathy Great share as usual! Just read this related piece where a study showed issues with LLM-based agents not recognizing sensitive information and not adhering to appropriate data handling protocols: theregister.com/2025/06/16/sal… paper: arxiv.org/abs/2505.18878

thumb_up_off_alt37

chat_bubble_outline0

repeat5

shareShare

Dion Hinchcliffe

@dhinchcliffe

5 months ago

4/ I’m actually bullish medium term involving AI in customer experience. But IT depts must educate themselves. The details on CRMArenaPro and the gap between LLMs / enterprise CRM needs in a major new paper by Salesforce AI Research’s Kung-Hsiang Steeve Huang + team: arxiv.org/abs/2505.18878

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare