Shafiq Joty (@jotyshafiq) 's Twitter Profile
Shafiq Joty

@jotyshafiq

Sr. Research Director@Salesforce AI, Assoc. Prof@NTU (on leave)
Project lead of SFR-RAG, SFR-Judge, XGen

ID: 1111600251939377152

linkhttps://raihanjoty.github.io calendar_today29-03-2019 12:05:27

350 Tweet

903 Takipçi

431 Takip Edilen

Revanth Reddy (On the Job Market) (@gangi_official) 's Twitter Profile Photo

The models and code are now public! Models on HF: huggingface.co/collections/Sa… Code: github.com/SalesforceAIRe… Project Page: salesforceairesearch.github.io/SweRank/ If you are interesting in integrating the SweRank models as a plug-in within VS Code, please do reach out! We have more exciting

Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

📣 SweRank Models & Code - Now Public! 📣 Following our recent announcement about AI-powered software issue localization, we’re excited to share: ✅ Models: huggingface.co/collections/Sa… ✅ Code: github.com/SalesforceAIRe… 📄 Paper: bit.ly/3S0x1fV 🎯 Demo:

Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

🎉 Excited to share that 4 papers have been accepted to Transactions on Machine Learning Research Transactions on Machine Learning Research! Follow Salesforce AI Research and bookmark this thread to stay updated on our latest #AI research. 📝 Shared Imagination: LLMs Hallucinate Alike 🖇️ arxiv.org/html/2407.1660… 📊 A

🎉 Excited to share that 4 papers have been accepted to Transactions on Machine Learning Research <a href="/TmlrOrg/">Transactions on Machine Learning Research</a>! Follow <a href="/SFResearch/">Salesforce AI Research</a> and bookmark this thread to stay updated on our latest #AI research.

📝 Shared Imagination: LLMs Hallucinate Alike
🖇️ arxiv.org/html/2407.1660…

📊 A
Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

🇨🇦 Excited to present our work at Conference on Language Modeling in Montreal! Oct 7-10 at Palais des Congrès!📄 Our accepted papers: CodeXEmbed: A Generalist Embedding Model Family for Multilingual and Multi-task Code Retrieval 👥Authors: Ye Liu, Rui Meng, Shafiq Joty Shafiq Joty, Silvio Savarese

🇨🇦 Excited to present our work at <a href="/COLM_conf/">Conference on Language Modeling</a> in Montreal! Oct 7-10 at Palais des Congrès!📄 Our accepted papers:

CodeXEmbed: A Generalist Embedding Model Family for Multilingual and Multi-task Code Retrieval
👥Authors: Ye Liu, Rui Meng, Shafiq Joty <a href="/JotyShafiq/">Shafiq Joty</a>, Silvio Savarese
Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

🌟 Excited to present our work at Empirical Methods in Natural Language Processing EMNLP 2025 - a leading conference in NLP and AI research! 📄 Our accepted papers: Topic-Guided Reinforcement Learning with LLMs for Enhancing Multi-Document Summarization 👥Authors: Chuyuan Li

🌟 Excited to present our work at Empirical Methods in Natural Language Processing <a href="/emnlpmeeting/">EMNLP 2025</a> - a leading conference in NLP and AI research!

📄 Our accepted papers:

Topic-Guided Reinforcement Learning with LLMs for Enhancing Multi-Document Summarization
👥Authors: Chuyuan Li
Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

EMNLP 2025 / #EMNLP2025 Accepted Paper: Text2Vis: A Challenging and Diverse Benchmark for Generating Multimodal Visualizations from Text 📝 Paper: arxiv.org/abs/2507.19969 This work introduces Text2Vis, a comprehensive benchmark for evaluating text-to-visualization models

<a href="/emnlpmeeting/">EMNLP 2025</a> / #EMNLP2025 Accepted Paper: Text2Vis: A Challenging and Diverse Benchmark for Generating Multimodal Visualizations from Text

📝 Paper: arxiv.org/abs/2507.19969

This work introduces Text2Vis, a comprehensive benchmark for evaluating text-to-visualization models
Shafiq Joty (@jotyshafiq) 's Twitter Profile Photo

Even the top-performing agents struggle on DashBoardQA. The best agent based on Gemini-Pro-2.5 achieves only 38.69% accuracy, while the OpenAI CUA agent reaches just 22.69%.

Shafiq Joty (@jotyshafiq) 's Twitter Profile Photo

We can now say we have a stable data and multi-turn RL training recipe for building autonomous deep research agents. Thanks to the awesome team!

Axel Darmouni (@adarmouni) 's Twitter Profile Photo

Salesforce AI Research published a really cool work a few days ago, showcasing the strength of open source specialization for open-ended multiturn tool use Specialized through RL Qwen3, QWQ & gpt-oss for DeepResearch tasks The specialized gpt-oss 20B is even with Deep Research itself 👀

<a href="/SFResearch/">Salesforce AI Research</a> published a really cool work a few days ago, showcasing the strength of open source specialization for open-ended multiturn tool use

Specialized through RL Qwen3, QWQ &amp; gpt-oss for DeepResearch tasks

The specialized gpt-oss 20B is even with Deep Research itself 👀
wh (@nrehiew_) 's Twitter Profile Photo

1 of the better DeepResearch papers (from SalesForce!) using the 20B gpt oss model. Really like the analysis comparing gpt oss and the qwen models agenticdegradation tendencies tldr: 4 tools: browse page, code intepreter, web search, clean_memory which replaces the model current

1 of the better DeepResearch papers (from SalesForce!) using the 20B gpt oss model. Really like the analysis comparing gpt oss and the qwen models agenticdegradation tendencies

tldr:
4 tools: browse page, code intepreter, web search, clean_memory which replaces the model current
Eliezer Yudkowsky ⏹️ (@esyudkowsky) 's Twitter Profile Photo

In the limit, there is zero alpha for multiple agents over one agent, on any task, ever. So the Bitter Lesson applies in full to your clever multi-agent framework; it's just you awkwardly trying to hardcode stuff that SGD can better bake into a single agent.

Shafiq Joty (@jotyshafiq) 's Twitter Profile Photo

Proud to share our new work on LLM Verification — the first systematic study of verification asymmetry and its implications for test-time scaling.

Silvio Savarese (@silviocinguetta) 's Twitter Profile Photo

Our latest work on AI verification dynamics is shifting how enterprises allocate compute resources - preventing waste on expensive verifiers that don't add value. Critical for cost-effective AI deployment. Great work Salesforce AI Research and Caiming Xiong arxiv.org/abs/2509.17995