Shaokun Zhang (@shaokunzhang1) 's Twitter Profile
Shaokun Zhang

@shaokunzhang1

PhD student @PennState. Co-Creator of #AutoGen | Research Intern @NvidiaAI @MSFTResearch

ID: 1381285453429571585

linkhttps://skzhang1.github.io calendar_today11-04-2021 16:38:23

38 Tweet

187 Followers

477 Following

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Cool paper from NVIDIA Prior methods for training LLMs for tool use rely on imitation or distilled reasoning, limiting generalization. Nemotron-Research-Tool-N1 uses rule-based reinforcement learning. It trains models with binary rewards evaluating only tool call structure

Cool paper from <a href="/nvidia/">NVIDIA</a>

Prior methods for training LLMs for tool use rely on imitation or distilled reasoning, limiting generalization.

Nemotron-Research-Tool-N1 uses rule-based reinforcement learning.

It trains models with binary rewards evaluating only tool call structure
Shaokun Zhang (@shaokunzhang1) 's Twitter Profile Photo

Tool-using LLMs can learn to reason—without reasoning traces. 🔥 We present Nemotron-Research-Tool-N1, a family of tool-using reasoning LLMs trained entirely via rule-based reinforcement learning—no reasoning supervision, no distillation. 📄 Paper: arxiv.org/pdf/2505.00024 💻

Tool-using LLMs can learn to reason—without reasoning traces.

🔥 We present Nemotron-Research-Tool-N1, a family of tool-using reasoning LLMs trained entirely via rule-based reinforcement learning—no reasoning supervision, no distillation.

📄 Paper: arxiv.org/pdf/2505.00024
💻
马东锡 NLP 🇸🇪 (@dongxi_nlp) 's Twitter Profile Photo

「Nvidia, Reasoning, Agent」 Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning Nemotron-Tool-N1 是 RLVR 思想在 tool calling 维度的深化版本,让 7 B / 14 B 模型在工具类基准全面领先 GPT-4o。精彩的工作!

「Nvidia, Reasoning, Agent」

Nemotron-Research-Tool-N1: Exploring Tool-Using  Language Models with Reinforced Reasoning

Nemotron-Tool-N1 是 RLVR 思想在 tool calling 维度的深化版本,让 7 B / 14 B 模型在工具类基准全面领先 GPT-4o。精彩的工作!
Linxin Song (@linxins2) 's Twitter Profile Photo

🚨 We discovered a surprising side effect of Reinforcement Finetuning (RFT): it makes LLMs more confidently wrong on unanswerable questions. We call this the hallucination tax: a drop in refusal behavior that leads to overconfident hallucinations. 🧵 1/n

🚨 We discovered a surprising side effect of Reinforcement Finetuning (RFT): it makes LLMs more confidently wrong on unanswerable questions.
We call this the hallucination tax: a drop in refusal behavior that leads to overconfident hallucinations.

🧵 1/n
Sean Xuefeng Du (@xuefeng_du) 's Twitter Profile Photo

Excited to join the College of Computing and Data Science at Nanyang Technological University, Singapore (NTU Singapore) as an Assistant Professor this fall! 🙌 Grateful to my advisor Sharon Y. Li and all who supported me along the way. Looking forward to the new chapter! 😄 🇸🇬

Excited to join the College of Computing and Data Science at Nanyang Technological University, Singapore (<a href="/NTUsg/">NTU Singapore</a>) as an Assistant Professor this fall! 🙌 

Grateful to my advisor <a href="/SharonYixuanLi/">Sharon Y. Li</a> and all who supported me along the way. Looking forward to the new chapter! 😄 🇸🇬
Shizhe Diao (@shizhediao) 's Twitter Profile Photo

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough!

Introducing ProRL 😎, a novel training recipe that scales RL to &gt;2k steps, empowering the world’s leading 1.5B reasoning model💥and offering
Zhiding Yu (@zhidingyu) 's Twitter Profile Photo

Document and Enterprise Intelligence is arguably one of the most important applications of VLMs and cloud services. NVIDIA VLM technologies help to build commercial grade models excelling in this area. The Eagle VLM Team, together with other colleagues at NVIDIA, are proud to be

Shaokun Zhang (@shaokunzhang1) 's Twitter Profile Photo

Great follow-up to our failure attribution research with the Who&When benchmark: arxiv.org/abs/2505.00212. Multi-agent debugging is crucial for improvement.

Shaokun Zhang (@shaokunzhang1) 's Twitter Profile Photo

Happy to see “on-demand tool loading” becoming a trend. We explored a similar idea last year. Our 2024 work EcoAct (arxiv.org/pdf/2411.01643) proposed letting LLMs dynamically decide when/how to register tools instead of loading everything upfront, using a similar “tool_register”

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale — Reasoning-first models built for agents! 🔹 DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API. 🔹 DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now. 📄 Tech

🚀 Launching DeepSeek-V3.2 &amp; DeepSeek-V3.2-Speciale — Reasoning-first models built for agents!

🔹 DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web &amp; API.
🔹 DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now.

📄 Tech
Dylan (@dylan_works_) 's Twitter Profile Photo

Inspired by on-policy RL, we rethink the standard SFT data recipe. We hypothesize that the model should agree on the SFT target to boost training effectiveness. We will present our NeurIPS spotlight paper The Best Instruction-Tuning Data are Those That Fit: An extremely simple &

Inspired by on-policy RL, we rethink the standard SFT data recipe. We hypothesize that the model should agree on the SFT target to boost training effectiveness.

We will present our NeurIPS spotlight paper The Best Instruction-Tuning Data are Those That Fit: An extremely simple &amp;
Qingyun Wu (@qingyun_wu) 's Twitter Profile Photo

🚀 So excited to be launching AG2 AgentOS — the operating fabric for AI-native teams. Meet the AG2 Universal Assistant: 🔍 Finds & delegates to the right agents ⚙️ Composes workflows with or without you 🔗 Runs complex multi-agent workflows anywhere via A2A 🌐 Builds your org’s