Zhoujun (Jorge) Cheng (@chengzhoujun) 's Twitter Profile
Zhoujun (Jorge) Cheng

@chengzhoujun

CS Ph.D. @UCSanDiego | Prev. @XLangNLP @MSFTResearch @sjtu1896

ID: 1462759457528553482

linkhttp://blankcheng.github.io calendar_today22-11-2021 12:27:18

190 Tweet

715 Followers

524 Following

Tianbao Xie (@tianbaox) 's Twitter Profile Photo

Countless times of iterations for cooking it, but the process is satisfying. I still believe we can pour more data in each stage if we have more hands so the potential is unlimited and scaling law hasnโ€™t hit the wall yet! Towards Digital Agents๐Ÿค– We are already on the way.

Qian Liu (@sivil_taram) 's Twitter Profile Photo

Wrapped up a SWE-Perf website redesign using Qwen3-Coder on AnyCoder (huggingface.co/spaces/akhaliqโ€ฆ). The process was incredibly fast and great! One question for Qwen devs, though: did you pretrain a secret love for the color purple into the coder's persona? ๐Ÿ˜‰

Wrapped up a SWE-Perf website redesign using Qwen3-Coder on AnyCoder (huggingface.co/spaces/akhaliqโ€ฆ). The process was incredibly fast and great!

One question for Qwen devs, though: did you pretrain a secret love for the color purple into the coder's persona? ๐Ÿ˜‰
Chujie Zheng (@chujiezheng) 's Twitter Profile Photo

Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) ๐Ÿš€ ๐Ÿ“„ huggingface.co/papers/2507.18โ€ฆ

Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) ๐Ÿš€

๐Ÿ“„ huggingface.co/papers/2507.18โ€ฆ
Feng Yao (@fengyao1909) 's Twitter Profile Photo

Failing on ๐ฅ๐š๐ซ๐ ๐ž-๐ฌ๐œ๐š๐ฅ๐ž ๐‘๐‹ with VeRL? โš ๏ธ Mixing inference backend (๐ฏ๐‹๐‹๐Œ/๐’๐†๐‹๐š๐ง๐ ) with training backends (๐…๐’๐ƒ๐/๐Œ๐ž๐ ๐š๐ญ๐ซ๐จ๐ง) ๐ฌ๐ž๐œ๐ซ๐ž๐ญ๐ฅ๐ฒ ๐ญ๐ฎ๐ซ๐ง๐ฌ ๐ฒ๐จ๐ฎ๐ซ ๐‘๐‹ ๐ข๐ง๐ญ๐จ ๐จ๐Ÿ๐Ÿ-๐ฉ๐จ๐ฅ๐ข๐œ๐ฒ โ€” even if they share the same weights! ๐Ÿ“‰ย Blog:

Failing on ๐ฅ๐š๐ซ๐ ๐ž-๐ฌ๐œ๐š๐ฅ๐ž ๐‘๐‹ with VeRL?

โš ๏ธ Mixing inference backend (๐ฏ๐‹๐‹๐Œ/๐’๐†๐‹๐š๐ง๐ ) with training backends (๐…๐’๐ƒ๐/๐Œ๐ž๐ ๐š๐ญ๐ซ๐จ๐ง) ๐ฌ๐ž๐œ๐ซ๐ž๐ญ๐ฅ๐ฒ ๐ญ๐ฎ๐ซ๐ง๐ฌ ๐ฒ๐จ๐ฎ๐ซ ๐‘๐‹ ๐ข๐ง๐ญ๐จ ๐จ๐Ÿ๐Ÿ-๐ฉ๐จ๐ฅ๐ข๐œ๐ฒ โ€” even if they share the same weights!

๐Ÿ“‰ย Blog:
Jinjie Ni @ ICLR'25 ๐Ÿ‡ธ๐Ÿ‡ฌ (@nijinjie) 's Twitter Profile Photo

Token crisis: solved. โœ… We pre-trained diffusion language models (DLMs) vs. autoregressive (AR) models from scratch โ€” up to 8B params, 480B tokens, 480 epochs. Findings: > DLMs beat AR when tokens are limited, with >3ร— data potential. > A 1B DLM trained on just 1B tokens

Token crisis: solved. โœ…

We pre-trained diffusion language models (DLMs) vs. autoregressive (AR) models from scratch โ€” up to 8B params, 480B tokens, 480 epochs.

Findings:
>  DLMs beat AR when tokens are limited, with >3ร— data potential.
>  A 1B DLM trained on just 1B tokens
Tianbao Xie (@tianbaox) 's Twitter Profile Photo

๐Ÿš€ OSWorld gets a major upgrade! OSWorld-Verified: 15 months community feedback โ†’ 300+ fixes (ambiguity, gradersโ€ฆ), 50x faster eval through AWS parallelization More apple-to-apple comparison for reliable CUA evaluation โœจ ๐Ÿ‘‡xlang.ai/blog/osworld-vโ€ฆ

Feng Yao (@fengyao1909) 's Twitter Profile Photo

โšก๐…๐๐Ÿ– makes RL faster โ€” but at the cost of performance. We present ๐…๐ฅ๐š๐ฌ๐ก๐‘๐‹, the first ๐จ๐ฉ๐ž๐งโ€“๐ฌ๐จ๐ฎ๐ซ๐œ๐ž & ๐ฐ๐จ๐ซ๐ค๐ข๐ง๐  ๐‘๐‹ ๐ซ๐ž๐œ๐ข๐ฉ๐ž that applies ๐ˆ๐๐“๐Ÿ–/๐…๐๐Ÿ– for rollout ๐ฐ๐ข๐ญ๐ก๐จ๐ฎ๐ญ ๐ฅ๐จ๐ฌ๐ข๐ง๐  ๐ฉ๐ž๐ซ๐Ÿ๐จ๐ซ๐ฆ๐š๐ง๐œ๐ž compared to ๐๐…๐Ÿ๐Ÿ”! ๐Ÿ“ Blog:

โšก๐…๐๐Ÿ– makes RL faster โ€” but at the cost of performance.

We present ๐…๐ฅ๐š๐ฌ๐ก๐‘๐‹, the first ๐จ๐ฉ๐ž๐งโ€“๐ฌ๐จ๐ฎ๐ซ๐œ๐ž & ๐ฐ๐จ๐ซ๐ค๐ข๐ง๐  ๐‘๐‹ ๐ซ๐ž๐œ๐ข๐ฉ๐ž that applies ๐ˆ๐๐“๐Ÿ–/๐…๐๐Ÿ– for rollout ๐ฐ๐ข๐ญ๐ก๐จ๐ฎ๐ญ ๐ฅ๐จ๐ฌ๐ข๐ง๐  ๐ฉ๐ž๐ซ๐Ÿ๐จ๐ซ๐ฆ๐š๐ง๐œ๐ž compared to ๐๐…๐Ÿ๐Ÿ”!

๐Ÿ“ Blog:
Zhoujun (Jorge) Cheng (@chengzhoujun) 's Twitter Profile Photo

Yes, brutally true. I tend to see LLM RL โ‰ˆ on-policy self-distilled SFT with reward re-weighting. The key differences between LLM SFT (rejection sampling) and RL are: 1. Negative examples, or more precisely, advantage-weighted samples 2. On-policyness: Even iterated SFT is more

Wen-Tse Chen (@wenzechen2) 's Twitter Profile Photo

[0/3] ๐Ÿš€ Introducing Verlog โ€“ an open-source RL framework built specifically for training long-horizon, multi-turn LLM agents. ๐Ÿ“Š Max episode length comparison: โ€ขVeRL / RAGEN โ†’ ~10 turns โ€ขverl-agent โ†’ ~50 turns โ€ขVerlog (ours) โ†’ 400+ turns ๐Ÿ”ฅ โš™๏ธ Technical foundation:

Tianbao Xie (@tianbaox) 's Twitter Profile Photo

Where are our computerโ€‘use agents (CUA) standing on OSWorldโ€‘Verified? Potentially already ~80%. We made this analysis, which summarizes the latest OSWorld-Verified submissions with 27 models evaluated over 369 tasks, and conducted a case study on the o3+Jedi-7B approach to

Where are our computerโ€‘use agents (CUA) standing on OSWorldโ€‘Verified? Potentially already ~80%.

We made this analysis, which summarizes the latest OSWorld-Verified submissions with 27 models evaluated over 369 tasks, and conducted a case study on the o3+Jedi-7B approach to
Xinyuan Wang (@xywang626) 's Twitter Profile Photo

We are super excited to release OpenCUA โ€” the first from 0 to 1 computer-use agent foundation model framework and open-source SOTA model OpenCUA-32B, matching top proprietary models on OSWorld-Verified, with full infrastructure and data. ๐Ÿ”— [Paper] arxiv.org/abs/2508.09123 ๐Ÿ“Œ

We are super excited to release OpenCUA โ€” the first from 0 to 1 computer-use agent foundation model framework and open-source SOTA model OpenCUA-32B, matching top proprietary models on OSWorld-Verified, with full infrastructure and data.

๐Ÿ”— [Paper] arxiv.org/abs/2508.09123 
๐Ÿ“Œ
Prophet Arena (@prophetarena) 's Twitter Profile Photo

๐Ÿ”ฎ Introducing Prophet Arena โ€” the AI benchmark for general predictive intelligence. That is, can AI truly predict the future by connecting todayโ€™s dots? ๐Ÿ‘‰ What makes it special? - It canโ€™t be hacked. Most benchmarks saturate over time, but here models face live, unseen

๐Ÿ”ฎ Introducing Prophet Arena โ€” the AI benchmark for general predictive intelligence.

That is, can AI truly predict the future by connecting todayโ€™s dots?

๐Ÿ‘‰ What makes it special?

- It canโ€™t be hacked. Most benchmarks saturate over time, but here models face live, unseen
Dynamics Lab (@dynamicslab_ai) 's Twitter Profile Photo

Try Mirage 2 now โ†’ dynamicslab.ai Here are a few worlds weโ€™ve createdโ€”starting from images and prompts. More in the thread ๐Ÿ‘‡ 2/

Zhiting Hu (@zhitinghu) 's Twitter Profile Photo

๐Ÿ”ฅ Super excited to launch Mirage 2 A big leap toward a general-purpose world engine for live interactive play ๐ŸŽฎ Hard to believe how far we've come in just one month since Mirage 1 โฉ If youโ€™re impressed by Genie 3, come play with Mirage 2 โ€” Itโ€™s live, offering an extended

Yichao Fu (@fuyichao123) 's Twitter Profile Photo

Excited to share my 1st project as a Research Scientist Intern at Meta FAIR! Grateful to my mentor Jiawei Zhao for guidance, and to Yuandong Tian & Xuewei for their valuable advice and collaboration. Our work DeepConf explores local confidence for more accurate & efficient LLM reasoning!

Yuandong Tian (@tydsh) 's Twitter Profile Photo

We released DeepConf that can achieve 99.9% on AIME'25 with open source models with only 15% of the compute, compared to majority voting@512. The secret? Simple. Just to pruning the rollouts if they show a consecutive stream of low-confidence๐Ÿ˜€. Can be applied to any models

Feng Yao (@fengyao1909) 's Twitter Profile Photo

We are glad that TIS and FlashRL have received broad attention from the open-source community that they have been verified and supported (OpenRLHF Jian Hu, SkyRL NovaSky, REINFORCE++Jian Hu, OAT Zichen Liu)! A few updates on our blog and FlashRL package: (1) more in-depth

We are glad that TIS and FlashRL have received broad attention from the open-source community that they have been verified and supported (OpenRLHF <a href="/hijkzzz/">Jian Hu</a>, SkyRL <a href="/NovaSkyAI/">NovaSky</a>, REINFORCE++<a href="/hijkzzz/">Jian Hu</a>, OAT <a href="/zzlccc/">Zichen Liu</a>)!

A few updates on our blog and FlashRL package:
(1) more in-depth
Daria Soboleva (@dmsobol) 's Twitter Profile Photo

Router wasn't learning at first, we debugged it step-by-step and showed you how despite perfect load balancing, routing can be completely useless. We root caused it and fixed the problem. Papers skip the methodology, but you can find all details in our part 3 of MoE 101 series

Ari Holtzman (@universeinanegg) 's Twitter Profile Photo

One of the reasons academic science is so bad at producing or even accepting novelty is because the focus on hypothesis driven science has caused people to view exploratory studies as inherently unrigorous.