Zihan Wang - on RAGEN (@wzihanw) 's Twitter Profile
Zihan Wang - on RAGEN

@wzihanw

PhD Student @NorthwesternU. I study PhysiCS of LLM. Ex @deepseek_ai @uiuc_nlp @RUC. Soon @yutori_ai. RAGEN | Chain-of-Experts | ESFT.

ID: 1507697433593098244

linkhttp://zihanwang314.github.io calendar_today26-03-2022 12:34:32

709 Tweet

22,22K Takipçi

525 Takip Edilen

Zihan Wang - on RAGEN (@wzihanw) 's Twitter Profile Photo

Excited to be speaking about agents at #GenAIWeek Silicon Valley this Sunday, July 13! Meet us at the panel Agent-to-Human Interfaces & Interactions, 10:50-11:30 AM :)

Excited to be speaking about agents at #GenAIWeek Silicon Valley this Sunday, July 13!
Meet us at the panel Agent-to-Human Interfaces & Interactions, 10:50-11:30 AM :)
Manling Li (@manlingli_) 's Twitter Profile Photo

Excited that Ruohan Zhang is joining NU Northwestern University Computer Science ! If you are thinking about pursuing a PhD, definitely reach out to him! During my wonderful year at Stanford AI Lab Stanford Vision and Learning Lab, when I was completely new to robotics, he was the nicest person who was incredibly patient

Excited that <a href="/RuohanZhang76/">Ruohan Zhang</a> is joining NU <a href="/northwesterncs/">Northwestern University Computer Science</a> ! If you are thinking about pursuing a PhD, definitely reach out to him!

During my wonderful year at <a href="/StanfordAILab/">Stanford AI Lab</a> <a href="/StanfordSVL/">Stanford Vision and Learning Lab</a>, when I was completely new to robotics, he was the nicest person who was incredibly patient
Zihan Wang - on RAGEN (@wzihanw) 's Twitter Profile Photo

Single-turn data -> Multi-turn RL for general reasoning. Simple method yet effective. Worth trying out! Kudos to our amazing intern Licheng Liu . He is applying for PhD positions for Fall 26!

Avi Sil (@aviaviavi__) 's Twitter Profile Photo

Is your LLM getting stuck while training with RL for agentic/ reasoning tasks? Well, turns out that a simple intuition of “trying again” works surprisingly well for reinforcement learning of LLMs and for domain adaptation!! Joint work with Northwestern Engineering Allen School

Manling Li (@manlingli_) 's Twitter Profile Photo

Do you find RL makes the LLM reasoning more stubborn? Keep repeating the same answers? How to make multi-turn conversational history be helpful in RL training? We identify a simple "try again" feedback can boost reasoning and make RL training a conversational manner!

Do you find RL makes the LLM reasoning more stubborn? Keep repeating the same answers?

How to make multi-turn conversational history be helpful in RL training?

We identify a simple "try again" feedback can boost reasoning and make RL training a conversational manner!
Manling Li (@manlingli_) 's Twitter Profile Photo

🏆Thrilled to receive ACL 2025 Inaugural Dissertation Award Honorable Mention. “Multimodality” has moved incredibly fast that my PhD research already feels like from a different era. It makes me wonder how challenging and anxious for today’s students to choose thesis

🏆Thrilled to receive <a href="/aclmeeting/">ACL 2025</a> Inaugural Dissertation Award Honorable Mention. 

“Multimodality” has moved incredibly fast that my PhD research already feels like from a different era. 

It makes me wonder how challenging and anxious for today’s students to choose thesis
Manling Li (@manlingli_) 's Twitter Profile Photo

Excited to do a talk at Agentic AI Summit! Session 4: Foundations of Agents 📍 Frontier Stage 📅 2:45pm PT Will talk about "RAGEN: Training Agents via Reinforcing Reasoning", including: - How to monitor RL training to converge for multi-turn LLM Agents? RAGEN

Excited to do a talk at Agentic AI Summit!

Session 4: Foundations of Agents
📍 Frontier Stage
📅 2:45pm PT

Will talk about "RAGEN: Training Agents via Reinforcing Reasoning", including:
- How to monitor RL training to converge for multi-turn LLM Agents? RAGEN
Zihan Wang - on RAGEN (@wzihanw) 's Twitter Profile Photo

To guys diving into fine-tuning open-source MoEs today: check out ESFT, our customized PEFT method for MoE models. Train with 90% less parameters, gain 95%+ task perf and keep 98% general perf :)

Manling Li (@manlingli_) 's Twitter Profile Photo

All week during rebuttals, I have started each day with the same reminder: stay humble, stay kind, don't let this turn me mean. When I was doing PhD, reviewers never felt this mean. There is a bright-eyed student sitting on the other side, and such reviews will destroy

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

Introducing DeepSeek-V3.1: our first step toward the agent era! 🚀 🧠 Hybrid inference: Think & Non-Think — one model, two modes ⚡️ Faster thinking: DeepSeek-V3.1-Think reaches answers in less time vs. DeepSeek-R1-0528 🛠️ Stronger agent skills: Post-training boosts tool use and

Jay (@jayendra_ram) 's Twitter Profile Photo

Since everyone is talking about RL Environments and GRPO now but no one knows how it works we thought it would be cool to make an explainer video + code you can run: This is an example of using GRPO to train Qwen 2.5 to play 2048 (code in thread) 🧵:

Fei-Fei Li (@drfeifei) 's Twitter Profile Photo

(1/N) How close are we to enabling robots to solve the long-horizon, complex tasks that matter in everyday life? 🚨 We are thrilled to invite you to join the 1st BEHAVIOR Challenge @NeurIPS 2025, submission deadline: 11/15. 🏆 Prizes: 🥇 $1,000 🥈 $500 🥉 $300

The Information (@theinformation) 's Twitter Profile Photo

.Yutori's Co-CEO Devi Parikh on why she left Meta to start her company with Abhishek Das. “We are rethinking what the interface with the web looks like, and that requires a certain sort of… thinking from first principles, thinking from scratch, thinking from a clean

Jyo Pari (@jyo_pari) 's Twitter Profile Photo

For agents to improve over time, they can’t afford to forget what they’ve already mastered. We found that supervised fine-tuning forgets more than RL when training on a new task! Want to find out why? 👇

For agents to improve over time, they can’t afford to forget what they’ve already mastered.

We found that supervised fine-tuning forgets more than RL when training on a new task! 

Want to find out why? 👇
Manling Li (@manlingli_) 's Twitter Profile Photo

Check out the 1st Behavior Challenge, co-host with our Foundation Models for Embodied Agent Challenge at NeurIPS …models-meet-embodied-agents.github.io/behavior_chall… When I first moved my focus from LLMs/VLMs toward embodied agents, I expected the biggest challenges would be around perception, motor

Zihan Wang - on RAGEN (@wzihanw) 's Twitter Profile Photo

Keyframe search has been there for Video LLMs, but humans don't glance at frames 📸 - we skip to key moments & trace them along. This is why we introduce Temporal Visual Screening: a coherent, natural info filtering schema for Video LLMs! Tons of interesting findings 👇🏻

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 Introducing DeepSeek-V3.2-Exp — our latest experimental model! ✨ Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context. 👉 Now live on App, Web, and API. 💰 API prices cut by 50%+! 1/n