Yang Chen (@ychennlp) 's Twitter Profile
Yang Chen

@ychennlp

accelerating @NVIDIA, phd @gtcomputing 🧊locked in

ID: 1043344251780833280

linkhttp://edchengg.github.io calendar_today22-09-2018 03:40:28

84 Tweet

844 Followers

491 Following

Zihan (Johan) Liu (@zihan_johan_liu) 's Twitter Profile Photo

Introducing AceMath-RL-Nemotron-7B, a math reasoning model trained entirely through reinforcement learning from DeepSeek-R1-Distilled-Qwen-7B. It achieves AIME24: 69.0%, AIME25: 53.6%, and GPQA: 52.1%. Interestingly, this math-focused RL training also improves the coding

Yang Chen (@ychennlp) 's Twitter Profile Photo

Had a lot of fun to scale up RL to improve math reasoning! Excited to introduce AceMath-RL-Nemotron-7B with a scalable training recipe 📑Full blog: research.nvidia.com/labs/adlr/acem… 🔗Model: huggingface.co/nvidia/AceMath…

Wei Ping (@_weiping) 's Twitter Profile Photo

Introducing AceReason-Nemotron: Advancing math and code reasoning through reinforcement learning (RL) We propose conducting RL on math-only prompts first, then on code-only prompts. Our key findings include: - Math-only RL significantly boosts both math and code benchmarks! -

Introducing AceReason-Nemotron: Advancing math and code reasoning through reinforcement learning (RL)

We propose conducting RL on math-only prompts first, then on code-only prompts. 
Our key findings include:
- Math-only RL significantly boosts both math and code benchmarks!
-
Yang Chen (@ychennlp) 's Twitter Profile Photo

with just math-RL, AceReason-Nemotron-14B surpass DeepCoder-14B on LiveCodeBench v5. we then did code-RL and found training becomes so much easier

with just math-RL, AceReason-Nemotron-14B surpass DeepCoder-14B on LiveCodeBench v5. 
we then did code-RL and found training becomes so much easier
AK (@_akhaliq) 's Twitter Profile Photo

Nvidia just dropped AceReason-Nemotron on Hugging Face Advancing Math and Code Reasoning through Reinforcement Learning

Nvidia just dropped AceReason-Nemotron on Hugging Face

Advancing Math and Code Reasoning through Reinforcement Learning
Yang Chen (@ychennlp) 's Twitter Profile Photo

Does RL incentive reasoning capability over the starting SFT model? We show an interesting result with our recent published AceReason-Nemotron-7B model, which was trained with RL pass@K from 1 to 1024 consistently +10% on LiveCodeBench v6 perhaps scaling RL is the key

Zhuolin Yang (@lucas110550) 's Twitter Profile Photo

Etash Guha Ryan Marten I tried to reproduce DS-R1-distilled-7B and AceReason-7B's performance on your split (06/24-01/25), and they turn out to be 41.9 and 54.6 correspondingly, which is obviously higher than your reported number. Anything wrong here? Etash Guha Ryan Marten

Yang Chen (@ychennlp) 's Twitter Profile Photo

📌Paper: arxiv.org/abs/2506.13284 📌Model: huggingface.co/nvidia/AceReas… 📌SFT Data: huggingface.co/datasets/nvidi… 📌Math RL Data: huggingface.co/datasets/nvidi… A series of our work on reasoning models: 📌5/22/2025: AceReason-Nemotron: Scaling RL for math and code (7B and 14B)

📌Paper: 
arxiv.org/abs/2506.13284
📌Model:
huggingface.co/nvidia/AceReas…
📌SFT Data:
huggingface.co/datasets/nvidi…
📌Math RL Data:
huggingface.co/datasets/nvidi…

A series of our work on reasoning models:

📌5/22/2025: AceReason-Nemotron: Scaling RL for math and code (7B and 14B)
Wei Ping (@_weiping) 's Twitter Profile Photo

Introducing AceReason-Nemotron 1.1 Our previous release, AceReason-Nemotron-1.0, introduced a stage-wise RL recipe that was applied sequentially to math-only and code-only prompts, demonstrating both high efficiency and strong effectiveness. Here, we systematically investigate

Introducing AceReason-Nemotron 1.1

Our previous release, AceReason-Nemotron-1.0, introduced a stage-wise RL recipe that was applied sequentially to math-only and code-only prompts, demonstrating both high efficiency and strong effectiveness.
Here, we systematically investigate
Zihan (Johan) Liu (@zihan_johan_liu) 's Twitter Profile Photo

With stronger SFT backbone, AceReason-Nemotron-1.1-7B significantly outperforms its predecessor and sets a record-high performance among Qwen2.5-7B-based reasoning models. 📄Report: arxiv.org/pdf/2506.13284 🤗Model: huggingface.co/nvidia/AceReas… 📚SFT Data: huggingface.co/datasets/nvidi…

Yang Chen (@ychennlp) 's Twitter Profile Photo

The first thing we did was to make sure the eval setup is correct! We spend a lot of time to make sure our eval can - accurately reproduce the DeepSeek-R1 numbers on AIME, LiveCodeBench - it's IMPOSSIBLE to track the RL progress without a good eval set up (e.g., we see AIME up

Zhuolin Yang (@lucas110550) 's Twitter Profile Photo

Our released evaluation toolkit can reproduce our AceReason-Nemotron models numbers (see below): AceReason-Nemotron-1.0-7B: LiveCodeBench (Avg@8): * [05/23-05/24]: 72.0; [06/24-01/25]: 54.2 * release set v5: 51.2; release set v6: 44.4 AIME (Avg@64): * AIME'24: 68.6; AIME'25:

Sam Altman (@sama) 's Twitter Profile Photo

we have signed a deal for an additional 4.5 gigawatts of capacity with oracle as part of stargate. easy to throw around numbers, but this is a _gigantic_ infrastructure project. some progress photos from abilene:

we have signed a deal for an additional 4.5 gigawatts of capacity with oracle as part of stargate. easy to throw around numbers, but this is a _gigantic_ infrastructure project.

some progress photos from abilene:
Elon Musk (@elonmusk) 's Twitter Profile Photo

Having thought about it some more, I think the 50 million H100 equivalent number in 5 years is about right. Eventually, billions.