Yuan He (@lawhy_x) 's Twitter Profile
Yuan He

@lawhy_x

Applied Scientist at Amazon Rufus | PhD @CompSciOxford | Contributing to open source @CamelAIOrg

ID: 1468998651171196929

linkhttps://www.yuanhe.wiki/ calendar_today09-12-2021 17:39:39

23 Tweet

54 Followers

53 Following

SEA Workshop (@seaworkshop) 's Twitter Profile Photo

Congrats to the following paper authors attaining Outstanding Paper Awards at SEA Workshop! RPGBENCH: Evaluating Large Language Models as Role-Playing Game Engines Pengfei Yu, Dongming Shen, Silin Meng, Jaewon Lee, Weisu Yin, Andrea Yaoyun Cui, Zhenlin Xu, Yi Zhu, Xingjian Shi,

Congrats to the following paper authors attaining Outstanding Paper Awards at <a href="/SEAWorkshop/">SEA Workshop</a>!

RPGBENCH: Evaluating Large Language Models as Role-Playing Game Engines

Pengfei Yu, Dongming Shen, Silin Meng, Jaewon Lee, Weisu Yin, Andrea Yaoyun Cui, Zhenlin Xu, Yi Zhu, Xingjian Shi,
SEA Workshop (@seaworkshop) 's Twitter Profile Photo

Congrats to the following paper authors attaining Outstanding Paper Awards at SEA Workshop! GEM: A Gym for Agentic LLMs Zichen Liu, Anya Sims, Keyu Duan, Changyu Chen, Haotian Xu, Simon Yu, Chenmien Tan, Shaopan Xiong, Weixun Wang, Bo Liu, Hao Zhu, Weiyan Shi, Diyi Yang, Wee

Congrats to the following paper authors attaining Outstanding Paper Awards at <a href="/SEAWorkshop/">SEA Workshop</a>!

GEM: A Gym for Agentic LLMs

Zichen Liu, Anya Sims, Keyu Duan, Changyu Chen, Haotian Xu, Simon Yu, Chenmien Tan, Shaopan Xiong, Weixun Wang, Bo Liu, Hao Zhu, Weiyan Shi, Diyi Yang, Wee
SEA Workshop (@seaworkshop) 's Twitter Profile Photo

The best poster awards go to: 1. Go-Browse: Training Web Agents with Structured Exploration Apurva Gandhi, Graham Neubig 2. Scaling Open-Ended Reasoning to Predict the Future Nikhil Chandak, Shashwat Goel, Ameya Prabhu, Moritz Hardt, Jonas Geiping 🎉Congrats!

The best poster awards go to:

1. Go-Browse: Training Web Agents with Structured Exploration
Apurva Gandhi, Graham Neubig

2. Scaling Open-Ended Reasoning to Predict the Future
Nikhil Chandak, Shashwat Goel, Ameya Prabhu, Moritz Hardt, Jonas Geiping

🎉Congrats!
Yuan He (@lawhy_x) 's Twitter Profile Photo

The SEA Workshop at NeurIPS 2025 was a tremendous success, bringing together frontier discussions on building and scaling agent environments. We were fortunate to have outstanding participants, speakers and panelists (Edward Grefenstette Mike A. Merrill Grégoire Mialon Deepak Nathani @ NeurIPS 2025

The <a href="/SEAWorkshop/">SEA Workshop</a> at NeurIPS 2025 was a tremendous success, bringing together frontier discussions on building and scaling agent environments. We were fortunate to have outstanding participants, speakers and panelists (<a href="/egrefen/">Edward Grefenstette</a> <a href="/Mike_A_Merrill/">Mike A. Merrill</a> <a href="/mialon_gregoire/">Grégoire Mialon</a> <a href="/deepaknathani11/">Deepak Nathani @ NeurIPS 2025</a>
Snorkel AI (@snorkelai) 's Twitter Profile Photo

ICYMI — the Terminal-Bench creators just laid out what actually matters for agent evaluation. Terminals > GUIs Containers for real rollouts TB 2.0 = harder tasks + deeper verification

Bonnie Li (@bonniesjli) 's Twitter Profile Photo

Can AI self-improve on its own and reach superhuman performance? 🧠 In our Sima 2 paper, we dropped a Gemini agent into an unseen 3D world. The model acted as the task proposer, the agent, and the reward model - autonomously learning from self-generated experience. It surpassed

Can AI self-improve on its own and reach superhuman performance? 🧠

In our Sima 2 paper, we dropped a Gemini agent into an unseen 3D world. The model acted as the task proposer, the agent, and the reward model - autonomously learning from self-generated experience. It surpassed
Christopher Manning (@chrmanning) 's Twitter Profile Photo

Great to see an AI lab doing and publishing science (as well as discussing engineering efficiencies)! Some of the other “frontier” labs should try it! Thx, DeepSeek!

Zhenghao Xu (@zhenghaoxu0) 's Twitter Profile Photo

Kimi.ai used policy mirror descent (PMD) for RL in Kimi k1.5/k2. Most take it simply as PG+KL with an updating anchor, but this is not the full story. Check our blog for some interesting findings about this algorithm: zhenghaoxu.notion.site/Revisiting-Kim…

<a href="/Kimi_Moonshot/">Kimi.ai</a> used policy mirror descent (PMD) for RL in Kimi k1.5/k2. Most take it simply as PG+KL with an updating anchor, but this is not the full story. Check our blog for some interesting findings about this algorithm: zhenghaoxu.notion.site/Revisiting-Kim…
Yuan He (@lawhy_x) 's Twitter Profile Photo

Claude Code is evolving from “you’re absolutely right” to having real stance. Good code comes from arguments. Humans bring taste and system design; agents write and debug. There’s a growing illusion that code generation equals shipping. Without iteration, constraints, and