Zihan Wang - on RAGEN (@wzihanw) Twitter Tweets • TwiCopy

good girl

@goodgirlxsz

5 hours ago

🔥Telegram İfşa

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Zihan Wang - on RAGEN

@wzihanw

4 months ago

Excited to be speaking about agents at #GenAIWeek Silicon Valley this Sunday, July 13! Meet us at the panel Agent-to-Human Interfaces & Interactions, 10:50-11:30 AM :)

thumb_up_off_alt36

chat_bubble_outline1

repeat5

shareShare

Excited that Ruohan Zhang is joining NU Northwestern University Computer Science ! If you are thinking about pursuing a PhD, definitely reach out to him! During my wonderful year at Stanford AI Lab Stanford Vision and Learning Lab, when I was completely new to robotics, he was the nicest person who was incredibly patient

Excited that <a href="/RuohanZhang76/">Ruohan Zhang</a> is joining NU <a href="/northwesterncs/">Northwestern University Computer Science</a> ! If you are thinking about pursuing a PhD, definitely reach out to him!

During my wonderful year at <a href="/StanfordAILab/">Stanford AI Lab</a> <a href="/StanfordSVL/">Stanford Vision and Learning Lab</a>, when I was completely new to robotics, he was the nicest person who was incredibly patient

thumb_up_off_alt35

chat_bubble_outline0

repeat5

shareShare

Zihan Wang - on RAGEN

@wzihanw

4 months ago

Single-turn data -> Multi-turn RL for general reasoning. Simple method yet effective. Worth trying out! Kudos to our amazing intern Licheng Liu . He is applying for PhD positions for Fall 26!

thumb_up_off_alt26

chat_bubble_outline1

repeat2

shareShare

Avi Sil

@aviaviavi__

4 months ago

Is your LLM getting stuck while training with RL for agentic/ reasoning tasks? Well, turns out that a simple intuition of “trying again” works surprisingly well for reinforcement learning of LLMs and for domain adaptation!! Joint work with Northwestern Engineering Allen School

thumb_up_off_alt8

chat_bubble_outline0

repeat3

shareShare

Manling Li

@manlingli_

4 months ago

Do you find RL makes the LLM reasoning more stubborn? Keep repeating the same answers? How to make multi-turn conversational history be helpful in RL training? We identify a simple "try again" feedback can boost reasoning and make RL training a conversational manner!

thumb_up_off_alt132

chat_bubble_outline2

repeat18

shareShare

Zihan Wang - on RAGEN

@wzihanw

4 months ago

Congratulations to my amazing advisor Manling Li !!!

thumb_up_off_alt45

chat_bubble_outline0

repeat3

shareShare

Manling Li

@manlingli_

4 months ago

🏆Thrilled to receive ACL 2025 Inaugural Dissertation Award Honorable Mention. “Multimodality” has moved incredibly fast that my PhD research already feels like from a different era. It makes me wonder how challenging and anxious for today’s students to choose thesis

🏆Thrilled to receive <a href="/aclmeeting/">ACL 2025</a> Inaugural Dissertation Award Honorable Mention.

“Multimodality” has moved incredibly fast that my PhD research already feels like from a different era.

It makes me wonder how challenging and anxious for today’s students to choose thesis

thumb_up_off_alt247

chat_bubble_outline25

repeat15

shareShare

Manling Li

@manlingli_

4 months ago

Excited to do a talk at Agentic AI Summit! Session 4: Foundations of Agents 📍 Frontier Stage 📅 2:45pm PT Will talk about "RAGEN: Training Agents via Reinforcing Reasoning", including: - How to monitor RL training to converge for multi-turn LLM Agents? RAGEN

thumb_up_off_alt88

chat_bubble_outline1

repeat15

shareShare

Zihan Wang - on RAGEN

@wzihanw

4 months ago

To guys diving into fine-tuning open-source MoEs today: check out ESFT, our customized PEFT method for MoE models. Train with 90% less parameters, gain 95%+ task perf and keep 98% general perf :)

thumb_up_off_alt199

chat_bubble_outline1

repeat27

shareShare

Manling Li

@manlingli_

4 months ago

All week during rebuttals, I have started each day with the same reminder: stay humble, stay kind, don't let this turn me mean. When I was doing PhD, reviewers never felt this mean. There is a bright-eyed student sitting on the other side, and such reviews will destroy

thumb_up_off_alt538

chat_bubble_outline15

repeat36

shareShare

DeepSeek

@deepseek_ai

3 months ago

Introducing DeepSeek-V3.1: our first step toward the agent era! 🚀 🧠 Hybrid inference: Think & Non-Think — one model, two modes ⚡️ Faster thinking: DeepSeek-V3.1-Think reaches answers in less time vs. DeepSeek-R1-0528 🛠️ Stronger agent skills: Post-training boosts tool use and

thumb_up_off_alt13,13K

chat_bubble_outline376

repeat1,1K

shareShare

Jay

@jayendra_ram

3 months ago

Since everyone is talking about RL Environments and GRPO now but no one knows how it works we thought it would be cool to make an explainer video + code you can run: This is an example of using GRPO to train Qwen 2.5 to play 2048 (code in thread) 🧵:

thumb_up_off_alt1,1K

chat_bubble_outline25

repeat163

shareShare

Fei-Fei Li

@drfeifei

3 months ago

(1/N) How close are we to enabling robots to solve the long-horizon, complex tasks that matter in everyday life? 🚨 We are thrilled to invite you to join the 1st BEHAVIOR Challenge @NeurIPS 2025, submission deadline: 11/15. 🏆 Prizes: 🥇 $1,000 🥈 $500 🥉 $300

thumb_up_off_alt952

chat_bubble_outline31

repeat184

shareShare

The Information

@theinformation

3 months ago

.Yutori's Co-CEO Devi Parikh on why she left Meta to start her company with Abhishek Das. “We are rethinking what the interface with the web looks like, and that requires a certain sort of… thinking from first principles, thinking from scratch, thinking from a clean

thumb_up_off_alt41

chat_bubble_outline0

repeat10

shareShare

Jyo Pari

@jyo_pari

3 months ago

For agents to improve over time, they can’t afford to forget what they’ve already mastered. We found that supervised fine-tuning forgets more than RL when training on a new task! Want to find out why? 👇

thumb_up_off_alt487

chat_bubble_outline5

repeat78

shareShare

Manling Li

@manlingli_

3 months ago

Check out the 1st Behavior Challenge, co-host with our Foundation Models for Embodied Agent Challenge at NeurIPS …models-meet-embodied-agents.github.io/behavior_chall… When I first moved my focus from LLMs/VLMs toward embodied agents, I expected the biggest challenges would be around perception, motor

thumb_up_off_alt54

chat_bubble_outline2

repeat14

shareShare

Zihan Wang - on RAGEN

@wzihanw

2 months ago

Keyframe search has been there for Video LLMs, but humans don't glance at frames 📸 - we skip to key moments & trace them along. This is why we introduce Temporal Visual Screening: a coherent, natural info filtering schema for Video LLMs! Tons of interesting findings 👇🏻

thumb_up_off_alt16

chat_bubble_outline1

repeat1

shareShare

Manling Li

@manlingli_

2 months ago

Honored to be named as MIT TR 35 Under 35 MIT Technology Review. Couldn’t have done this without the best PhD advisor Heng Ji, my academic home Siebel School of Computing and Data Science, and the most supportive postdoc mentors Jiajun Wu Fei-Fei Li Stanford AI Lab Stanford HAI, my forever mentor Shih-Fu Chang,

Honored to be named as MIT TR 35 Under 35 <a href="/techreview/">MIT Technology Review</a>.

Couldn’t have done this without the best PhD advisor <a href="/hengjinlp/">Heng Ji</a>, my academic home <a href="/siebelschool/">Siebel School of Computing and Data Science</a>, and the most supportive postdoc mentors <a href="/jiajunwu_cs/">Jiajun Wu</a> <a href="/drfeifei/">Fei-Fei Li</a> <a href="/StanfordAILab/">Stanford AI Lab</a> <a href="/StanfordHAI/">Stanford HAI</a>, my forever mentor Shih-Fu Chang,

thumb_up_off_alt211

chat_bubble_outline30

repeat10

shareShare

DeepSeek

@deepseek_ai

2 months ago

🚀 Introducing DeepSeek-V3.2-Exp — our latest experimental model! ✨ Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context. 👉 Now live on App, Web, and API. 💰 API prices cut by 50%+! 1/n

thumb_up_off_alt6,6K

chat_bubble_outline227

repeat893

shareShare

clhong1248

@carinalhong

2 months ago

Today, I am launching Axiom At Axiom, we are building a self-improving superintelligent reasoner, starting with an AI mathematician.

thumb_up_off_alt2,2K

chat_bubble_outline181

repeat259

shareShare