Zhoujun (Jorge) Cheng
@chengzhoujun
CS Ph.D. @UCSanDiego | Prev. @XLangNLP @MSFTResearch @sjtu1896
ID: 1462759457528553482
http://blankcheng.github.io 22-11-2021 12:27:18
190 Tweet
715 Followers
524 Following
Failing on ๐ฅ๐๐ซ๐ ๐-๐ฌ๐๐๐ฅ๐ ๐๐ with VeRL? โ ๏ธ Mixing inference backend (๐ฏ๐๐๐/๐๐๐๐๐ง๐ ) with training backends (๐ ๐๐๐/๐๐๐ ๐๐ญ๐ซ๐จ๐ง) ๐ฌ๐๐๐ซ๐๐ญ๐ฅ๐ฒ ๐ญ๐ฎ๐ซ๐ง๐ฌ ๐ฒ๐จ๐ฎ๐ซ ๐๐ ๐ข๐ง๐ญ๐จ ๐จ๐๐-๐ฉ๐จ๐ฅ๐ข๐๐ฒ โ even if they share the same weights! ๐ย Blog:
โก๐ ๐๐ makes RL faster โ but at the cost of performance. We present ๐ ๐ฅ๐๐ฌ๐ก๐๐, the first ๐จ๐ฉ๐๐งโ๐ฌ๐จ๐ฎ๐ซ๐๐ & ๐ฐ๐จ๐ซ๐ค๐ข๐ง๐ ๐๐ ๐ซ๐๐๐ข๐ฉ๐ that applies ๐๐๐๐/๐ ๐๐ for rollout ๐ฐ๐ข๐ญ๐ก๐จ๐ฎ๐ญ ๐ฅ๐จ๐ฌ๐ข๐ง๐ ๐ฉ๐๐ซ๐๐จ๐ซ๐ฆ๐๐ง๐๐ compared to ๐๐ ๐๐! ๐ Blog:
[0/3] ๐ Introducing Verlog โ an open-source RL framework built specifically for training long-horizon, multi-turn LLM agents. ๐ Max episode length comparison: โขVeRL / RAGEN โ ~10 turns โขverl-agent โ ~50 turns โขVerlog (ours) โ 400+ turns ๐ฅ โ๏ธ Technical foundation:
๐ฅ Super excited to launch Mirage 2 A big leap toward a general-purpose world engine for live interactive play ๐ฎ Hard to believe how far we've come in just one month since Mirage 1 โฉ If youโre impressed by Genie 3, come play with Mirage 2 โ Itโs live, offering an extended
Excited to share my 1st project as a Research Scientist Intern at Meta FAIR! Grateful to my mentor Jiawei Zhao for guidance, and to Yuandong Tian & Xuewei for their valuable advice and collaboration. Our work DeepConf explores local confidence for more accurate & efficient LLM reasoning!