Shafiq Joty (@jotyshafiq) Twitter Tweets • TwiCopy

The models and code are now public! Models on HF: huggingface.co/collections/Sa… Code: github.com/SalesforceAIRe… Project Page: salesforceairesearch.github.io/SweRank/ If you are interesting in integrating the SweRank models as a plug-in within VS Code, please do reach out! We have more exciting

thumb_up_off_alt24

chat_bubble_outline0

repeat9

shareShare

Salesforce AI Research

@sfresearch

4 months ago

📣 SweRank Models & Code - Now Public! 📣 Following our recent announcement about AI-powered software issue localization, we’re excited to share: ✅ Models: huggingface.co/collections/Sa… ✅ Code: github.com/SalesforceAIRe… 📄 Paper: bit.ly/3S0x1fV 🎯 Demo:

thumb_up_off_alt5

chat_bubble_outline0

repeat2

shareShare

Salesforce AI Research

@sfresearch

4 months ago

🎉 Excited to share that 4 papers have been accepted to Transactions on Machine Learning Research Transactions on Machine Learning Research! Follow Salesforce AI Research and bookmark this thread to stay updated on our latest #AI research. 📝 Shared Imagination: LLMs Hallucinate Alike 🖇️ arxiv.org/html/2407.1660… 📊 A

🎉 Excited to share that 4 papers have been accepted to Transactions on Machine Learning Research <a href="/TmlrOrg/">Transactions on Machine Learning Research</a>! Follow <a href="/SFResearch/">Salesforce AI Research</a> and bookmark this thread to stay updated on our latest #AI research.

📝 Shared Imagination: LLMs Hallucinate Alike
🖇️ arxiv.org/html/2407.1660…

📊 A

thumb_up_off_alt12

chat_bubble_outline0

repeat3

shareShare

Shafiq Joty

@jotyshafiq

4 months ago

Awesome work on understanding RL for LLM reasoning. Please check it out.

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare

Salesforce AI Research

@sfresearch

3 months ago

🇨🇦 Excited to present our work at Conference on Language Modeling in Montreal! Oct 7-10 at Palais des Congrès!📄 Our accepted papers: CodeXEmbed: A Generalist Embedding Model Family for Multilingual and Multi-task Code Retrieval 👥Authors: Ye Liu, Rui Meng, Shafiq Joty Shafiq Joty, Silvio Savarese

🇨🇦 Excited to present our work at <a href="/COLM_conf/">Conference on Language Modeling</a> in Montreal! Oct 7-10 at Palais des Congrès!📄 Our accepted papers:

CodeXEmbed: A Generalist Embedding Model Family for Multilingual and Multi-task Code Retrieval
👥Authors: Ye Liu, Rui Meng, Shafiq Joty <a href="/JotyShafiq/">Shafiq Joty</a>, Silvio Savarese

thumb_up_off_alt20

chat_bubble_outline0

repeat3

shareShare

Salesforce AI Research

@sfresearch

2 months ago

🌟 Excited to present our work at Empirical Methods in Natural Language Processing EMNLP 2025 - a leading conference in NLP and AI research! 📄 Our accepted papers: Topic-Guided Reinforcement Learning with LLMs for Enhancing Multi-Document Summarization 👥Authors: Chuyuan Li

🌟 Excited to present our work at Empirical Methods in Natural Language Processing <a href="/emnlpmeeting/">EMNLP 2025</a> - a leading conference in NLP and AI research!

📄 Our accepted papers:

Topic-Guided Reinforcement Learning with LLMs for Enhancing Multi-Document Summarization
👥Authors: Chuyuan Li

thumb_up_off_alt62

chat_bubble_outline1

repeat14

shareShare

Salesforce AI Research

@sfresearch

2 months ago

EMNLP 2025 / #EMNLP2025 Accepted Paper: Text2Vis: A Challenging and Diverse Benchmark for Generating Multimodal Visualizations from Text 📝 Paper: arxiv.org/abs/2507.19969 This work introduces Text2Vis, a comprehensive benchmark for evaluating text-to-visualization models

<a href="/emnlpmeeting/">EMNLP 2025</a> / #EMNLP2025 Accepted Paper: Text2Vis: A Challenging and Diverse Benchmark for Generating Multimodal Visualizations from Text

📝 Paper: arxiv.org/abs/2507.19969

This work introduces Text2Vis, a comprehensive benchmark for evaluating text-to-visualization models

thumb_up_off_alt3

chat_bubble_outline0

repeat2

shareShare

Shafiq Joty

@jotyshafiq

2 months ago

Even the top-performing agents struggle on DashBoardQA. The best agent based on Gemini-Pro-2.5 achieves only 38.69% accuracy, while the OpenAI CUA agent reaches just 22.69%.

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Shafiq Joty

@jotyshafiq

2 months ago

We can now say we have a stable data and multi-turn RL training recipe for building autonomous deep research agents. Thanks to the awesome team!

thumb_up_off_alt17

chat_bubble_outline0

repeat7

shareShare

Shafiq Joty

@jotyshafiq

2 months ago

Thanks for the shout out!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Shafiq Joty

@jotyshafiq

2 months ago

Thanks for the nice words.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Axel Darmouni

@adarmouni

2 months ago

Salesforce AI Research published a really cool work a few days ago, showcasing the strength of open source specialization for open-ended multiturn tool use Specialized through RL Qwen3, QWQ & gpt-oss for DeepResearch tasks The specialized gpt-oss 20B is even with Deep Research itself 👀

<a href="/SFResearch/">Salesforce AI Research</a> published a really cool work a few days ago, showcasing the strength of open source specialization for open-ended multiturn tool use

Specialized through RL Qwen3, QWQ & gpt-oss for DeepResearch tasks

The specialized gpt-oss 20B is even with Deep Research itself 👀

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

dinos

@din0s_

2 months ago

solid work & a great paper, check it out

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

wh

@nrehiew_

2 months ago

1 of the better DeepResearch papers (from SalesForce!) using the 20B gpt oss model. Really like the analysis comparing gpt oss and the qwen models agenticdegradation tendencies tldr: 4 tools: browse page, code intepreter, web search, clean_memory which replaces the model current

thumb_up_off_alt168

chat_bubble_outline3

repeat20

shareShare

Eliezer Yudkowsky ⏹️

@esyudkowsky

a month ago

In the limit, there is zero alpha for multiple agents over one agent, on any task, ever. So the Bitter Lesson applies in full to your clever multi-agent framework; it's just you awkwardly trying to hardcode stuff that SGD can better bake into a single agent.

thumb_up_off_alt400

chat_bubble_outline72

repeat36

shareShare

Shafiq Joty

@jotyshafiq

a month ago

Proud to share our new work on LLM Verification — the first systematic study of verification asymmetry and its implications for test-time scaling.

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Silvio Savarese

@silviocinguetta

a month ago

Our latest work on AI verification dynamics is shifting how enterprises allocate compute resources - preventing waste on expensive verifiers that don't add value. Critical for cost-effective AI deployment. Great work Salesforce AI Research and Caiming Xiong arxiv.org/abs/2509.17995

thumb_up_off_alt7

chat_bubble_outline0

repeat3

shareShare