Changyu Chen (@cameron_chann) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

🚀 General-Reasoner: Generalizing LLM Reasoning Across All Domains (Beyond Math) Most recent RL/R1 works focus on math reasoning—but math-only tuning doesn't generalize to general reasoning (e.g. drop on MMLU-Pro and SuperGPQA). Why are we limited to math reasoning? 1. Existing

thumb_up_off_alt332

chat_bubble_outline8

repeat76

shareShare

Zichen Liu @ ICLR2025

@zzlccc

2 months ago

I can feel #ICLR2025 already starts… welcome everyone to 🇸🇬 Singapore! Lets meet and chat about RL, LLM and reasoning :)

thumb_up_off_alt54

chat_bubble_outline1

repeat2

shareShare

Jinjie Ni @ ICLR'25 🇸🇬

@nijinjie

2 months ago

Remember the NoisyStudent topping ImageNet back in 2019🏆? Was it the last dance of noisy training? 🍻 Meet NoisyRollout, our new noisy training efforts in building stronger o1-like visual reasoners. ✨ With only 2.1k training data and zero additional training cost, it hits

thumb_up_off_alt76

chat_bubble_outline3

repeat17

shareShare

Changyu Chen

@cameron_chann

2 months ago

Super cool! Personalizing LLMs is the way to go if we want AI that truly works for everyone.

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Zichen Liu @ ICLR2025

@zzlccc

2 months ago

🚨 RL x LLM folks at #ICLR2025 — come join us during the Friday lunch break! If you haven’t RSVP’d on Whova, you can also register here: lu.ma/s8udv997?tk=B4… Bo Liu (Benjamin Liu) and I will scout for a chill spot (likely a corner at the venue) and share the location tomorrow.

🚨 RL x LLM folks at #ICLR2025 — come join us during the Friday lunch break!

If you haven’t RSVP’d on Whova, you can also register here: lu.ma/s8udv997?tk=B4…

<a href="/Benjamin_eecs/">Bo Liu (Benjamin Liu)</a> and I will scout for a chill spot (likely a corner at the venue) and share the location tomorrow.

thumb_up_off_alt49

chat_bubble_outline2

repeat9

shareShare

Haonan Wang@ICLR 2025🇸🇬

@haonan_wang_

2 months ago

Already in Singapore and looking for a nice place to chat? Welcome to SSNLP! Looking forward to connecting with you. #SSNLP #ICLR2025

thumb_up_off_alt8

chat_bubble_outline1

repeat1

shareShare

Hongfu Liu @ICLR 2025🇸🇬

@waffle42567405

2 months ago

I will attend #ICLR2025 🇸🇬 to present our work "On Calibration of LLM-based Guard Models for Reliable Content Moderation". We advocate reliability evaluation of LLM guardrail models as current ones are overconfident, miscalibrated, and brittle. —✨come see us at Hall 3 + Hall 2B

thumb_up_off_alt16

chat_bubble_outline0

repeat3

shareShare

Fan Zhou✈️ICLR2025

@fazhou_998

2 months ago

Say hi to 🐙 OctoThinker — our new mid-training efforts for building strong reasoning base models tailored for the RL scaling era. Still a WIP, but we're excited to share our early insights into rethinking base model development. 📖 Blog: tinyurl.com/OctoThinker 🤗 Huggingface:

thumb_up_off_alt187

chat_bubble_outline4

repeat42

shareShare

Zeyuan Allen-Zhu, Sc.D.

@zeyuanallenzhu

2 months ago

(1/8)🍎A Galileo moment for LLM design🍎 As Pisa Tower experiment sparked modern physics, our controlled synthetic pretraining playground reveals LLM architectures' true limits. A turning point that might divide LLM research into "before" and "after." physics.allen-zhu.com/part-4-archite…

thumb_up_off_alt934

chat_bubble_outline22

repeat150

shareShare

John Yang

@jyangballin

a month ago

40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified. We built it by synthesizing a ton of agentic training data from 100+ Python repos. Today we’re open-sourcing the toolkit that made it happen: SWE-smith.

thumb_up_off_alt638

chat_bubble_outline25

repeat132

shareShare

Zichen Liu @ ICLR2025

@zzlccc

a month ago

Good catch! I ran into the same reusability issue ~1 year ago with OpenRLHF. That’s why I built oat🌾 (github.com/sail-sg/oat) — a modular RL LLM framework inspired by DeepMind’s ecosystem. Just define your actor, learner, and env in a single script — and you’re good to go :)

thumb_up_off_alt93

chat_bubble_outline1

repeat7

shareShare

Changyu Chen

@cameron_chann

a month ago

Really interesting work for reasoning performance given any budget: - Better than GRPO at anytime including the final performance - Provide flexibility for users to trade off the cost and performance - Great engineering effort - code optimized for tree-like generation & training

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

Alexia Jolicoeur-Martineau

@jm_alexia

a month ago

I tried TRL again... I'm going back to OAT. Every time I try to use TRL, its always a nightmare. OAT is plug and play. github.com/sail-sg/oat

thumb_up_off_alt27

chat_bubble_outline2

repeat6

shareShare

Zichen Liu @ ICLR2025

@zzlccc

24 days ago

Reinforcing General Reasoning without Verifiers 🈚️ R1-Zero-like RL thrives in domains with verifiable rewards (code, math). But real-world reasoning (chem, bio, econ…) lacks easy rule-based verifiers — and model-based verifiers add complexity. Introducing *VeriFree*: ⚡ Skip

thumb_up_off_alt234

chat_bubble_outline6

repeat43

shareShare

Changyu Chen

@cameron_chann

24 days ago

Highly agree! A strong prior is essential for the success of RL training in LLMs, as we show in the Llama experiments (arxiv.org/pdf/2503.20783); a strong prior also makes improvement so easy that it can create “RL just works” noise.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Jinjie Ni @ ICLR'25 🇸🇬

@nijinjie

17 days ago

Ready to supercharge your vision-language reasoners with scalable RL fuels? RL is promising, but good data is a bottleneck! 😤 🚀 Introducing SynthRL: A highly guaranteed and scalable pipeline to synthesize verifiable & progressively harder training data, tailor-made for RL in

thumb_up_off_alt37

chat_bubble_outline2

repeat13

shareShare

Zichen Liu @ ICLR2025

@zzlccc

7 days ago

Nice follow-up! Spurious rewards and spurious prompts re-confirm the biases cooked into Qwen base models. Revisiting our results in March (arxiv.org/pdf/2503.20783 Section 2.2 & 3.3): - No template is the best - Much of RL's gain comes from correcting model-template mismatch

thumb_up_off_alt106

chat_bubble_outline1

repeat12

shareShare

Changyu Chen

Gate.io

Wenhu Chen

Zichen Liu @ ICLR2025

Jinjie Ni @ ICLR'25 🇸🇬

Changyu Chen

Zichen Liu @ ICLR2025

Haonan Wang@ICLR 2025🇸🇬

Hongfu Liu @ICLR 2025🇸🇬

Fan Zhou✈️ICLR2025

Zeyuan Allen-Zhu, Sc.D.

John Yang

Zichen Liu @ ICLR2025

Changyu Chen

Alexia Jolicoeur-Martineau

Zichen Liu @ ICLR2025

Changyu Chen

Jinjie Ni @ ICLR'25 🇸🇬

Zichen Liu @ ICLR2025