Zhuolin Yang (@lucas110550) 's Twitter Profile
Zhuolin Yang

@lucas110550

Research Scientist @NVIDIA, Ph.D @UofIllinois.
Words are my own.

ID: 696579442026663936

linkhttps://lucas110550.github.io/about/ calendar_today08-02-2016 06:20:55

14 Tweet

24 Followers

30 Following

Yang Chen (@ychennlp) 's Twitter Profile Photo

📢We conduct a systematic study to demystify the synergy between SFT and RL for reasoning models. The result? We trained a 7B model - AceReason-Nemotron-1.1, significantly improved from version 1.0 on math and coding benchmarks. ✅AIME2025 (math): 53.6% -> 64.8% ✅LiveCodeBench

📢We conduct a systematic study to demystify the synergy between SFT and RL for reasoning models.

The result? We trained a 7B model - AceReason-Nemotron-1.1, significantly improved from version 1.0 on math and coding benchmarks.

✅AIME2025 (math): 53.6% -> 64.8%
✅LiveCodeBench
Wei Ping (@_weiping) 's Twitter Profile Photo

Introducing AceReason-Nemotron 1.1 Our previous release, AceReason-Nemotron-1.0, introduced a stage-wise RL recipe that was applied sequentially to math-only and code-only prompts, demonstrating both high efficiency and strong effectiveness. Here, we systematically investigate

Introducing AceReason-Nemotron 1.1

Our previous release, AceReason-Nemotron-1.0, introduced a stage-wise RL recipe that was applied sequentially to math-only and code-only prompts, demonstrating both high efficiency and strong effectiveness.
Here, we systematically investigate
Zihan (Johan) Liu (@zihan_johan_liu) 's Twitter Profile Photo

With stronger SFT backbone, AceReason-Nemotron-1.1-7B significantly outperforms its predecessor and sets a record-high performance among Qwen2.5-7B-based reasoning models. 📄Report: arxiv.org/pdf/2506.13284 🤗Model: huggingface.co/nvidia/AceReas… 📚SFT Data: huggingface.co/datasets/nvidi…

Zhuolin Yang (@lucas110550) 's Twitter Profile Photo

Our released evaluation toolkit can reproduce our AceReason-Nemotron models numbers (see below): AceReason-Nemotron-1.0-7B: LiveCodeBench (Avg@8): * [05/23-05/24]: 72.0; [06/24-01/25]: 54.2 * release set v5: 51.2; release set v6: 44.4 AIME (Avg@64): * AIME'24: 68.6; AIME'25:

Zhuolin Yang (@lucas110550) 's Twitter Profile Photo

I stayed up late last night to (unofficially) participate ICPC WF25 online mirror by using our dev coding LLM. Up to now, it solves 5 problems (out of 12 problems in total) - D F H K L FYI: I'm just using a small scale LLM so it can be deployed on one single GPU. Some brief

I stayed up late last night to (unofficially) participate ICPC WF25 online mirror by using our dev coding LLM. Up to now, it solves 5 problems (out of 12 problems in total) - D F H K L

FYI: I'm just using a small scale LLM so it can be deployed on one single GPU.

Some brief
Jasper Dekoninck (@j_dekoninck) 's Twitter Profile Photo

A new open reasoning model, K2-Think, was recently released boasting scores comparable to GPT-OSS 120B and getting a lot of media attention. However, their performance relies on flawed evaluation marked by contamination, unfair comparisons, and misrepresentation of results. 🧵

A new open reasoning model, K2-Think, was recently released boasting scores comparable to GPT-OSS 120B and getting a lot of media attention.

However, their performance relies on flawed evaluation marked by contamination, unfair comparisons, and misrepresentation of results. 🧵