
Kung-Hsiang Steeve Huang
@steeve__huang
Research Scientist @SFResearch | Formerly: PhD @UofIllinois, PhD Fellow @AmazonScience, MSc @USCViterbi, BEng @HKUST | He/him/his 🇹🇼 | #NLP
ID: 932462804056936448
http://khuangaf.github.io 20-11-2017 04:17:11
935 Tweet
1,1K Followers
274 Following


CRMArena-Pro reveals why enterprise AI deployment remains challenging—many top-performing agents struggle significantly on real-world business tasks. 👇Full technical breakdown from our research lead Kung-Hsiang Steeve Huang below. #EnterpriseAI #AgenticAI #EGI





Thanks Marktechpost AI Research News ⚡ for covering CRMArena-Pro 🙏 Our new benchmark reveals that even the best LLM agents achieve only ~58% success rate on realistic business tasks, dropping to 35% in multi-turn scenarios. Also, confidentiality awareness is nearly non-existent across all models









Andrej Karpathy Great share as usual! Just read this related piece where a study showed issues with LLM-based agents not recognizing sensitive information and not adhering to appropriate data handling protocols: theregister.com/2025/06/16/sal… paper: arxiv.org/abs/2505.18878

4/ I’m actually bullish medium term involving AI in customer experience. But IT depts must educate themselves. The details on CRMArenaPro and the gap between LLMs / enterprise CRM needs in a major new paper by Salesforce AI Research’s Kung-Hsiang Steeve Huang + team: arxiv.org/abs/2505.18878