Furong Huang (@furongh) Twitter Tweets • TwiCopy

Furong Huang

3 months ago

Everyone’s hyped about test-time scaling—more steps, longer traces, just add “Wait” or “Let me rethink,” and boom: better reasoning? Not quite. We find that performance almost always improves at first—then declines. Classic overthinking. That’s not news. But why does it happen?

thumb_up_off_alt69

chat_bubble_outline0

repeat15

shareShare

Furong Huang

@furongh

2 months ago

This inspired me.

thumb_up_off_alt17

chat_bubble_outline0

repeat0

shareShare

Furong Huang

@furongh

2 months ago

Great minds think alike! 👀🧠 We also found that more thinking ≠ better reasoning. In our recent paper (arxiv.org/abs/2506.04210), we show how output variance creates the illusion of improvement—when in fact, it can hurt precision. Naïve test-time scaling needs a rethink. 👇

thumb_up_off_alt96

chat_bubble_outline4

repeat13

shareShare

Furong Huang

@furongh

a month ago

Highly recommend working with Niloofar!

thumb_up_off_alt13

chat_bubble_outline2

repeat1

shareShare

Furong Huang

@furongh

a month ago

I agree. We shouldn’t adopt a zero-sum mindset. It’s toxic to our community. We should focus on accepting good papers, not artificially limiting acceptance. I also don’t see why acceptance rate needs to be the metric for a conference’s prestige.

thumb_up_off_alt17

chat_bubble_outline1

repeat1

shareShare

Furong Huang

@furongh

a month ago

Would love to learn more. Curious—what exactly defines a frontier lab? It seems people aren’t the key factor, since talent is expected to come from academia. So is it all about compute and infrastructure? If only funding agencies realized: it’s not magic, it’s money. That’s what

thumb_up_off_alt27

chat_bubble_outline2

repeat2

shareShare

Nathan Lambert

@natolambert

a month ago

People are always asking for recommendations for other great content to read, but few people find that I maintain a full list of recommendations with my blog Interconnects (link to page below). Here's the list in no structured order: 1. Helen Toner (Helen Toner), Rising Tide:

thumb_up_off_alt351

chat_bubble_outline8

repeat50

shareShare

Kangwook Lee

@kangwook_lee

a month ago

🧵When training reasoning models, what's the best approach? SFT, Online RL, or perhaps Offline RL? At KRAFTON AI and SK telecom, we've explored this critical question, uncovering interesting insights! Let’s dive deeper, starting with the basics first. 1) SFT SFT (aka hard

thumb_up_off_alt149

chat_bubble_outline4

repeat31

shareShare

Furong Huang

@furongh

a month ago

NeurIPS rebuttal deadline is around the corner 😬 I’m not an expert, but thought I’d drop my two cents on how to write a good rebuttal, especially for folks writing their first few. Hope this helps someone! 🧵👇 (And please chime in with your own tips; let’s crowdsource the

thumb_up_off_alt474

chat_bubble_outline8

repeat48

shareShare

Furong Huang

@furongh

a month ago

Thanks for sharing, John.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Furong Huang

@furongh

a month ago

Huge congratulations Sanae Lotfi! I absolutely enjoyed your presentation and your dissertation!!!

thumb_up_off_alt27

chat_bubble_outline2

repeat1

shareShare

Cas (Stephen Casper)

@stephenlcasper

a month ago

🧵 New paper from AI Security Institute x EleutherAI that I led with Kyle O’Brien: Open-weight LLM safety is both important & neglected. But we show that filtering dual-use knowledge from pre-training data improves tamper resistance *>10x* over post-training baselines.

🧵 New paper from <a href="/AISecurityInst/">AI Security Institute</a> x <a href="/AiEleuther/">EleutherAI</a> that I led with Kyle O’Brien:

Open-weight LLM safety is both important & neglected. But we show that filtering dual-use knowledge from pre-training data improves tamper resistance *>10x* over post-training baselines.

thumb_up_off_alt197

chat_bubble_outline7

repeat40

shareShare

FAR.AI

@farairesearch

a month ago

Singapore Alignment Workshop videos are live! Hear from Furong Huang Tianwei Zhang Jiaming Ji Animesh Mukherjee Weiyan Shi@ICLR and CHI Yinpeng Dong Cassidy Laidlaw Pin-Yu Chen Baoyuan Wu + more.

thumb_up_off_alt70

chat_bubble_outline3

repeat8

shareShare

Saining Xie

@sainingxie

20 days ago

I know op is click-baiting, but let me bite... fwiw every researcher’s DREAM is to find out their architecture is wrong. If it’s never wrong, that’s a bigger problem. we try to break DiT every day w/ SiT, REPA, REPA-E etc. but you gotta form hypotheses, run experiments, test, not

thumb_up_off_alt531

chat_bubble_outline12

repeat51

shareShare

Furong Huang

@furongh

19 days ago

💃🕺🪩 DISCO 🪩 🕺💃 is now accepted to EMNLP findings. Congratulations to Yuhang Zhou and collaborators!

thumb_up_off_alt46

chat_bubble_outline0

repeat8

shareShare

AK

@_akhaliq

6 days ago

LLaVA-Critic-R1 Your Critic Model is Secretly a Strong Policy Model

thumb_up_off_alt105

chat_bubble_outline5

repeat22

shareShare

Xiyao Wang

@xiyaowang10

5 days ago

Thanks to AK for sharing our paper!🎉 Training a generative critic model to judge responses makes it BETTER at EVERYTHING. Sometimes the best policy comes from good judgment. Your critic model has been hiding its true potential🌟 🚀Introducing LLaVA-Critic-R1, a family of VLMs

thumb_up_off_alt12

chat_bubble_outline1

repeat5

shareShare

Xiyao Wang

@xiyaowang10

5 days ago

4/ The correlation shocked us. We tracked both capabilities during training: Steps 0-200: Near-perfect correlation between critic and policy performance Step 350: Policy peaks as critic stabilizes They're not independent skills. They're fundamentally linked.

thumb_up_off_alt3

chat_bubble_outline1

repeat2

shareShare

Furong Huang

@furongh

5 days ago

Your critic model is secretly a strong policy model. Stay tuned for a deep dive 🤩

thumb_up_off_alt24

chat_bubble_outline0

repeat1

shareShare

Furong Huang

@furongh

3 days ago

✨ New semester, new beginnings! Grateful to start the term with such an inspiring group — the smartest, most diligent, and most creative minds. Together, our lab is pushing the boundaries of trustworthy AI agents, from digital systems to embodied robots. Excited for the

thumb_up_off_alt118

chat_bubble_outline5

repeat3

shareShare