Furong Huang (@furongh) 's Twitter Profile
Furong Huang

@furongh

Associate professor of @umdcs @umiacs @ml_umd at UMD. Researcher in #AI/#ML, AI #Alignment, #RLHF, #Trustworthy ML, #EthicalAI, AI #Democratization, AI for ALL.

ID: 195674678

linkhttps://furong-huang.com/ calendar_today27-09-2010 09:11:38

1,1K Tweet

7,7K Followers

2,2K Following

Furong Huang (@furongh) 's Twitter Profile Photo

Everyone’s hyped about test-time scaling—more steps, longer traces, just add “Wait” or “Let me rethink,” and boom: better reasoning? Not quite. We find that performance almost always improves at first—then declines. Classic overthinking. That’s not news. But why does it happen?

Furong Huang (@furongh) 's Twitter Profile Photo

Great minds think alike! 👀🧠 We also found that more thinking ≠ better reasoning. In our recent paper (arxiv.org/abs/2506.04210), we show how output variance creates the illusion of improvement—when in fact, it can hurt precision. Naïve test-time scaling needs a rethink. 👇

Furong Huang (@furongh) 's Twitter Profile Photo

I agree. We shouldn’t adopt a zero-sum mindset. It’s toxic to our community. We should focus on accepting good papers, not artificially limiting acceptance. I also don’t see why acceptance rate needs to be the metric for a conference’s prestige.

Furong Huang (@furongh) 's Twitter Profile Photo

Would love to learn more. Curious—what exactly defines a frontier lab? It seems people aren’t the key factor, since talent is expected to come from academia. So is it all about compute and infrastructure? If only funding agencies realized: it’s not magic, it’s money. That’s what

Nathan Lambert (@natolambert) 's Twitter Profile Photo

People are always asking for recommendations for other great content to read, but few people find that I maintain a full list of recommendations with my blog Interconnects (link to page below). Here's the list in no structured order: 1. Helen Toner (Helen Toner), Rising Tide:

Kangwook Lee (@kangwook_lee) 's Twitter Profile Photo

🧵When training reasoning models, what's the best approach? SFT, Online RL, or perhaps Offline RL? At KRAFTON AI and SK telecom, we've explored this critical question, uncovering interesting insights! Let’s dive deeper, starting with the basics first. 1) SFT SFT (aka hard

Furong Huang (@furongh) 's Twitter Profile Photo

NeurIPS rebuttal deadline is around the corner 😬 I’m not an expert, but thought I’d drop my two cents on how to write a good rebuttal, especially for folks writing their first few. Hope this helps someone! 🧵👇 (And please chime in with your own tips; let’s crowdsource the

Cas (Stephen Casper) (@stephenlcasper) 's Twitter Profile Photo

🧵 New paper from AI Security Institute x EleutherAI that I led with Kyle O’Brien: Open-weight LLM safety is both important & neglected. But we show that filtering dual-use knowledge from pre-training data improves tamper resistance *>10x* over post-training baselines.

🧵 New paper from <a href="/AISecurityInst/">AI Security Institute</a> x <a href="/AiEleuther/">EleutherAI</a> that I led with Kyle O’Brien:

Open-weight LLM safety is both important &amp; neglected. But we show that filtering dual-use knowledge from pre-training data improves tamper resistance *&gt;10x* over post-training baselines.
Saining Xie (@sainingxie) 's Twitter Profile Photo

I know op is click-baiting, but let me bite... fwiw every researcher’s DREAM is to find out their architecture is wrong. If it’s never wrong, that’s a bigger problem. we try to break DiT every day w/ SiT, REPA, REPA-E etc. but you gotta form hypotheses, run experiments, test, not

Xiyao Wang (@xiyaowang10) 's Twitter Profile Photo

Thanks to AK for sharing our paper!🎉 Training a generative critic model to judge responses makes it BETTER at EVERYTHING. Sometimes the best policy comes from good judgment. Your critic model has been hiding its true potential🌟 🚀Introducing LLaVA-Critic-R1, a family of VLMs

Xiyao Wang (@xiyaowang10) 's Twitter Profile Photo

4/ The correlation shocked us. We tracked both capabilities during training: Steps 0-200: Near-perfect correlation between critic and policy performance Step 350: Policy peaks as critic stabilizes They're not independent skills. They're fundamentally linked.

4/

The correlation shocked us. We tracked both capabilities during training:
Steps 0-200: Near-perfect correlation between critic and policy performance
Step 350: Policy peaks as critic stabilizes
They're not independent skills. They're fundamentally linked.
Furong Huang (@furongh) 's Twitter Profile Photo

✨ New semester, new beginnings! Grateful to start the term with such an inspiring group — the smartest, most diligent, and most creative minds. Together, our lab is pushing the boundaries of trustworthy AI agents, from digital systems to embodied robots. Excited for the

✨ New semester, new beginnings!

Grateful to start the term with such an inspiring group — the smartest, most diligent, and most creative minds. Together, our lab is pushing the boundaries of trustworthy AI agents, from digital systems to embodied robots.

Excited for the