Brian Huang ✈️ ICLR (@brianryhuang) Twitter Tweets • TwiCopy

Brian Huang ✈️ ICLR

@brianryhuang

5 months ago

Congrats on first to AGI DeepSeek!!!

thumb_up_off_alt11

chat_bubble_outline0

repeat0

shareShare

ACADEMICS: it is time to get our heads out of our *sses. This is not the moment for personal ambition, why your latest sophisticated widget beats rivals intricate theorem. The scientific franchise is under attack. It is time to defend it to the public. x.com/davidbau/statu…

thumb_up_off_alt142

chat_bubble_outline2

repeat21

shareShare

Anne Ouyang

@anneouyang

5 months ago

✨ New blog post 👀: We have some very fast AI-generated kernels generated with a simple test-time only search. They are performing close to or in some cases even beating the standard expert-optimized production kernels shipped in PyTorch. (1/6) [🔗 link in final post]

thumb_up_off_alt776

chat_bubble_outline25

repeat95

shareShare

Brian Huang ✈️ ICLR

@brianryhuang

5 months ago

man this new addison rae sounds like marina electra heart X crystal castles ii

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Zixuan Wang

@zzzixuanwang

5 months ago

LLMs can solve complex tasks that require combining multiple reasoning steps. But when are such capabilities learnable via gradient-based training? In our new COLT 2025 paper, we show that easy-to-hard data is necessary and sufficient! arxiv.org/abs/2505.23683 🧵 below (1/10)

thumb_up_off_alt186

chat_bubble_outline1

repeat34

shareShare

Brian Huang ✈️ ICLR

@brianryhuang

5 months ago

In chatgpt you should be able to @ to reference a specific previous conversation

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

jason

@agikoala

5 months ago

thumb_up_off_alt38

chat_bubble_outline1

repeat3

shareShare

Brian Huang ✈️ ICLR

@brianryhuang

5 months ago

Might be obvious q but I think it's important. Say you have N SFT pairs on the same input, (x, y_0), ..., (x, y_n). After SFT on these N pairs to turn model pi_0 --> pi_SFT, does it always happen that pi_SFT(y_i|x) >= pi_0(y_i|x) for all i, 0 <= i <= n?

thumb_up_off_alt4

chat_bubble_outline2

repeat0

shareShare

Brian Huang ✈️ ICLR

@brianryhuang

5 months ago

what if multi-turn is just one single-turn reasoning chain with user-intent-querying tool calls

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Xin Cynthia Chen

@xincynthiachen

5 months ago

Code: github.com/lasgroup/Safet… Paper: arxiv.org/abs/2505.24445 Huge thanks to my collaborator and supervisor: Yarden As Andreas Krause

thumb_up_off_alt14

chat_bubble_outline0

repeat2

shareShare

X. Dong

@simonxindong

5 months ago

It does not saturate yet. At NVIDIA, we present "prolonged RL" where we significantly scale up RL training steps (+2k) and problems (+130k). The improvement from RL scaling is surprising and exciting. The RL-ed model makes great progress on some problems that the base model

thumb_up_off_alt443

chat_bubble_outline9

repeat42

shareShare

Brian Huang ✈️ ICLR

@brianryhuang

5 months ago

I remember upperclassman undergrad years were so bad that everyone lost faith in me and I basically spent the summer afterwards holed up addicted to video games and feeling guilty. The way the ML industry market opened to young talent is super fortunate and basically saved me

thumb_up_off_alt19

chat_bubble_outline1

repeat0

shareShare

Casper Hansen

@casper_hansen_

5 months ago

Latent reasoning does not work. Evals don't reproduce, approach doesn't scale with parameters. $1200 spent on trying to reproduce Quiet-STaR $500 spent on trying to reproduce COCONUT graciously funded by PI 6 months ago when we thought this was hot before RLVR took off

thumb_up_off_alt440

chat_bubble_outline39

repeat17

shareShare

Uzay @ paris

@uzpg_

5 months ago

Kaiwan Turel, awzf , and I were researching long horizon reasoning (with Jacob Andreas). We found existing benchmarks’ hard problems often featured tricky puzzles, not tests of system understanding. So we made Breakpoint: a SWE benchmark designed to disambiguate this capability.

<a href="/kaivu/">Kaiwan Turel</a>, <a href="/atticuswzf/">awzf</a> , and I were researching long horizon reasoning (with <a href="/jacobandreas/">Jacob Andreas</a>). We found existing benchmarks’ hard problems often featured tricky puzzles, not tests of system understanding. So we made Breakpoint: a SWE benchmark designed to disambiguate this capability.

thumb_up_off_alt47

chat_bubble_outline3

repeat10

shareShare

Varun Mohan

@_mohansolo

5 months ago

With less than five days of notice, Anthropic decided to cut off nearly all of our first-party capacity to all Claude 3.x models. Given the short notice, we may see some short-term Claude 3.x model availability issues as we have very quickly ramped up capacity on other inference

thumb_up_off_alt2,2K

chat_bubble_outline272

repeat151

shareShare

Brian Huang ✈️ ICLR

@brianryhuang

5 months ago

It’s yet again true that you gain unmatched valuable intuition from just sucking it up and squinting at the data A LOT

thumb_up_off_alt16

chat_bubble_outline2

repeat0

shareShare

Brian Huang ✈️ ICLR

@brianryhuang

5 months ago

I'm gonna take a step back and only stick to academic research tweeting -- I still haven't been feeling well lately and I know my tweets have been erratic, sorry for putting that on the timeline

thumb_up_off_alt17

chat_bubble_outline2

repeat0

shareShare

Brian Huang ✈️ ICLR

@brianryhuang

5 months ago

“My Little Brother Took My Controller: Training Video Game Agents via Adversarial Self-Play”

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Brian Huang ✈️ ICLR

@brianryhuang

5 months ago

Windsurf, Cursor, and the large model labs have never revolved a piece of marketing around badmouthing a competitor, and you should think about why that is

thumb_up_off_alt306

chat_bubble_outline12

repeat5

shareShare

Brian Huang ✈️ ICLR

@brianryhuang

5 months ago

One bottleneck on frontier model training I haven't seen much talk about: we want a way to convert natural language labels into weight updates that goes beyond the pretraining objective, SFT objective, or policy gradients. The status quo is that you have your human or LLM

thumb_up_off_alt11

chat_bubble_outline2

repeat0

shareShare

Brian Huang ✈️ ICLR

Brian Huang ✈️ ICLR

David Bau

Anne Ouyang

Brian Huang ✈️ ICLR

Zixuan Wang

Brian Huang ✈️ ICLR

jason

Brian Huang ✈️ ICLR

Brian Huang ✈️ ICLR

Xin Cynthia Chen

X. Dong

Brian Huang ✈️ ICLR

Casper Hansen

Uzay @ paris

Varun Mohan

Brian Huang ✈️ ICLR

Brian Huang ✈️ ICLR

Brian Huang ✈️ ICLR

Brian Huang ✈️ ICLR

Brian Huang ✈️ ICLR