Yuan He (@lawhy_x) Twitter Tweets • TwiCopy

Yuan He

@lawhy_x

+ Follow

Applied Scientist at Amazon Rufus | PhD @CompSciOxford | Contributing to open source @CamelAIOrg

ID: 1468998651171196929

linkhttps://www.yuanhe.wiki/ calendar_today09-12-2021 17:39:39

23 Tweet

54 Followers

53 Following

Yuan He

@lawhy_x

5 months ago

Great talk, Deepak Nathani!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Congrats to the following paper authors attaining Outstanding Paper Awards at SEA Workshop! RPGBENCH: Evaluating Large Language Models as Role-Playing Game Engines Pengfei Yu, Dongming Shen, Silin Meng, Jaewon Lee, Weisu Yin, Andrea Yaoyun Cui, Zhenlin Xu, Yi Zhu, Xingjian Shi,

Congrats to the following paper authors attaining Outstanding Paper Awards at <a href="/SEAWorkshop/">SEA Workshop</a>!

RPGBENCH: Evaluating Large Language Models as Role-Playing Game Engines

Pengfei Yu, Dongming Shen, Silin Meng, Jaewon Lee, Weisu Yin, Andrea Yaoyun Cui, Zhenlin Xu, Yi Zhu, Xingjian Shi,

thumb_up_off_alt9

chat_bubble_outline3

repeat2

shareShare

SEA Workshop

@seaworkshop

5 months ago

Congrats to the following paper authors attaining Outstanding Paper Awards at SEA Workshop! GEM: A Gym for Agentic LLMs Zichen Liu, Anya Sims, Keyu Duan, Changyu Chen, Haotian Xu, Simon Yu, Chenmien Tan, Shaopan Xiong, Weixun Wang, Bo Liu, Hao Zhu, Weiyan Shi, Diyi Yang, Wee

Congrats to the following paper authors attaining Outstanding Paper Awards at <a href="/SEAWorkshop/">SEA Workshop</a>!

GEM: A Gym for Agentic LLMs

Zichen Liu, Anya Sims, Keyu Duan, Changyu Chen, Haotian Xu, Simon Yu, Chenmien Tan, Shaopan Xiong, Weixun Wang, Bo Liu, Hao Zhu, Weiyan Shi, Diyi Yang, Wee

thumb_up_off_alt22

chat_bubble_outline3

repeat4

shareShare

SEA Workshop

@seaworkshop

5 months ago

The best poster awards go to: 1. Go-Browse: Training Web Agents with Structured Exploration Apurva Gandhi, Graham Neubig 2. Scaling Open-Ended Reasoning to Predict the Future Nikhil Chandak, Shashwat Goel, Ameya Prabhu, Moritz Hardt, Jonas Geiping 🎉Congrats!

thumb_up_off_alt20

chat_bubble_outline2

repeat8

shareShare

vincent sunn chen

@vincentsunnchen

5 months ago

Kudos to all the speakers/panelists (Edward Grefenstette, Mike A. Merrill, Grégoire Mialon, Deepak Nathani @ NeurIPS 2025, Joseph Marino, Shuyan Zhou🛸NeurIPS, Qian Huang, Anthony G. Cohn, Eric Sommerlade, Fred Sala) and organizers (Guohao Li 🐫, Yuan He @ NeurIPS 2025, May Fung (hiring postdocs), Qingyun Wang, Fangru Lin @NeurIPS, Xingyue Huang @ NeurIPS 25, Alisia Lupidi,

Kudos to all the speakers/panelists (<a href="/egrefen/">Edward Grefenstette</a>,
<a href="/Mike_A_Merrill/">Mike A. Merrill</a>, <a href="/mialon_gregoire/">Grégoire Mialon</a>, <a href="/deepaknathani11/">Deepak Nathani @ NeurIPS 2025</a>,
<a href="/jl_marino/">Joseph Marino</a>, <a href="/syz0x1/">Shuyan Zhou🛸NeurIPS</a>, <a href="/qhwang3/">Qian Huang</a>, Anthony G. Cohn, Eric Sommerlade, <a href="/fredsala/">Fred Sala</a>) and organizers (<a href="/guohao_li/">Guohao Li 🐫</a>, <a href="/lawhy_X/">Yuan He @ NeurIPS 2025</a>, <a href="/May_F1_/">May Fung (hiring postdocs)</a>, <a href="/eagle_hz/">Qingyun Wang</a>, <a href="/FangruLin99/">Fangru Lin @NeurIPS</a>, <a href="/hxyscott/">Xingyue Huang @ NeurIPS 25</a>, <a href="/AlisiaLupidi/">Alisia Lupidi</a>,

thumb_up_off_alt11

chat_bubble_outline0

repeat3

shareShare

Yuan He

@lawhy_x

5 months ago

The SEA Workshop at NeurIPS 2025 was a tremendous success, bringing together frontier discussions on building and scaling agent environments. We were fortunate to have outstanding participants, speakers and panelists (Edward Grefenstette Mike A. Merrill Grégoire Mialon Deepak Nathani @ NeurIPS 2025

The <a href="/SEAWorkshop/">SEA Workshop</a> at NeurIPS 2025 was a tremendous success, bringing together frontier discussions on building and scaling agent environments. We were fortunate to have outstanding participants, speakers and panelists (<a href="/egrefen/">Edward Grefenstette</a> <a href="/Mike_A_Merrill/">Mike A. Merrill</a> <a href="/mialon_gregoire/">Grégoire Mialon</a> <a href="/deepaknathani11/">Deepak Nathani @ NeurIPS 2025</a>

thumb_up_off_alt20

chat_bubble_outline0

repeat4

shareShare

Snorkel AI

@snorkelai

5 months ago

ICYMI — the Terminal-Bench creators just laid out what actually matters for agent evaluation. Terminals > GUIs Containers for real rollouts TB 2.0 = harder tasks + deeper verification

thumb_up_off_alt106

chat_bubble_outline6

repeat23

shareShare

Bonnie Li

@bonniesjli

5 months ago

Can AI self-improve on its own and reach superhuman performance? 🧠 In our Sima 2 paper, we dropped a Gemini agent into an unseen 3D world. The model acted as the task proposer, the agent, and the reward model - autonomously learning from self-generated experience. It surpassed

thumb_up_off_alt1,1K

chat_bubble_outline68

repeat167

shareShare

Christopher Manning

@chrmanning

4 months ago

Great to see an AI lab doing and publishing science (as well as discussing engineering efficiencies)! Some of the other “frontier” labs should try it! Thx, DeepSeek!

thumb_up_off_alt1,1K

chat_bubble_outline25

repeat88

shareShare

Yuan He

@lawhy_x

3 months ago

Couldn’t agree more

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Zhenghao Xu

@zhenghaoxu0

3 months ago

Kimi.ai used policy mirror descent (PMD) for RL in Kimi k1.5/k2. Most take it simply as PG+KL with an updating anchor, but this is not the full story. Check our blog for some interesting findings about this algorithm: zhenghaoxu.notion.site/Revisiting-Kim…

<a href="/Kimi_Moonshot/">Kimi.ai</a> used policy mirror descent (PMD) for RL in Kimi k1.5/k2. Most take it simply as PG+KL with an updating anchor, but this is not the full story. Check our blog for some interesting findings about this algorithm: zhenghaoxu.notion.site/Revisiting-Kim…

thumb_up_off_alt38

chat_bubble_outline1

repeat9

shareShare

vincent sunn chen

@vincentsunnchen

3 months ago

x.com/i/article/2021…

thumb_up_off_alt313

chat_bubble_outline16

repeat78

shareShare

Yuan He

@lawhy_x

2 months ago

It’s Chinese new year eve🤣.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Yuan He

@lawhy_x

a month ago

Congrats for the milestone!

thumb_up_off_alt8

chat_bubble_outline1

repeat0

shareShare

Yuan He

@lawhy_x

a month ago

Thinking in the “right” amount - how to shape this reward signal is challenging.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Yuan He

@lawhy_x

a month ago

Claude Code is evolving from “you’re absolutely right” to having real stance. Good code comes from arguments. Humans bring taste and system design; agents write and debug. There’s a growing illusion that code generation equals shipping. Without iteration, constraints, and

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare