Jiawei Wang (@jarvismsustc) 's Twitter Profile
Jiawei Wang

@jarvismsustc

Joint PhD Candidate @USTC and @MSFTResearch. Intern @MSFTResearch @deepseek_ai @BytedanceTalk

ID: 1460775848718610439

linkhttps://jarvisustc.github.io/ calendar_today17-11-2021 01:04:52

28 Tweet

42 Followers

201 Following

Jiawei Wang (@jarvismsustc) 's Twitter Profile Photo

Excited to be part of Eval Sys and contribute to our debut milestone, MCPMark! Follow us for updates, and come join the team—we’re just getting started! Github: github.com/eval-sys/mcpma… Website: mcpmark.ai Huggingface trajectory log: huggingface.co/datasets/Jakum…

Xiangyan Liu (@dobogiyy) 's Twitter Profile Photo

Sharing some of my thoughts when developing, hope they can help 👇 1/ Choosing the initial state defines task diversity, difficulty, and usefulness. 2/ State tracking and management is the trickiest stage. Each MCP needs its own isolation strategy. Worth it though: sandboxing

Eval Sys (@evalsysorg) 's Twitter Profile Photo

MCPMark Leaderboard Update 🚀 🌟 Qwen-3-Coder takes the #1 spot among open-source models, with an impressive per-run cost of just $36.46. ⚡️ Grok-Code-Fast-1 delivers the lowest per-run cost ($16.08) and the fastest average agent time (156.63s) across the top 10 models.

MCPMark Leaderboard Update 🚀

🌟 Qwen-3-Coder takes the #1 spot among open-source models, with an impressive per-run cost of just $36.46.

⚡️ Grok-Code-Fast-1 delivers the lowest per-run cost ($16.08) and the fastest average agent time (156.63s) across the top 10 models.
Jiawei Wang (@jarvismsustc) 's Twitter Profile Photo

Our latest blog uses detailed experiments to deeply explore a key cause of RL training collapse: training-inference mismatch. It may provide useful references for your work, and we welcome any discussions around it.😊

Yingru Li (@richardyrli) 's Twitter Profile Photo

Daniel Han, glad you liked the post! You're spot on to suspect lower-level implementation issues. That's exactly what we found in the original blog. The disable_cascade_attn finding (Sec 4.2.4) was the symptom, but the root cause was that silent FlashAttention-2 kernel bug

<a href="/danielhanchen/">Daniel Han</a>, glad you liked the post! You're spot on to suspect lower-level implementation issues. That's exactly what we found in the original blog. 
The disable_cascade_attn finding (Sec 4.2.4) was the symptom, but the root cause was that silent FlashAttention-2 kernel bug
Yingru Li (@richardyrli) 's Twitter Profile Photo

🚨 UPDATE to the "1 bit per episode" analysis (inspired by john schulman's post at Thinking Machines ): After discussion with mgostIH, I ned to points out the limit only applies to *scalar advantage*! REINFORCE with per-timestep advantages can learn O(T) bits when rewards are