Dang Nguyen (@dangnth97) 's Twitter Profile
Dang Nguyen

@dangnth97

PhD @CS_UCLA | Student Researcher @GoogleAI | IMO 2015 Silver

ID: 1635212781086711810

linkhttps://hsgser.github.io/ calendar_today13-03-2023 09:35:28

96 Tweet

273 Followers

1,1K Following

Ali Behrouz (@behrouz_ali) 's Twitter Profile Photo

Attention has been the key component for most advances in LLMs, but it can’t scale to long context. Does this mean we need to find an alternative? Presenting Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time. Titans

Attention has been the key component for most advances in LLMs, but it can’t scale to long context. Does this mean we need to find an alternative? 

Presenting Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time. Titans
Wenhu Chen (@wenhuchen) 's Twitter Profile Photo

Everyone is talking about RL these days. But are we done with SFT? The answer is NO. If we revive SFT in another form, it can even beat RL! Very happy to introduce Critique Fine-Tuning, a new form of SFT, which can more efficiently activate language models' reasoning

Everyone is talking about RL these days. But are we done with SFT? The answer is NO. If we revive SFT in another form, it can even beat RL!

Very happy to introduce Critique Fine-Tuning, a new form of SFT, which can more efficiently activate language models' reasoning
Nathan Lambert (@natolambert) 's Twitter Profile Photo

Since everyone wants to learn RL for language models now post DeepSeek, reminder that I've been working on this book quietly in the background for months. Policy gradient chapter is coming together. Plugging away at the book every day now. rlhfbook dot com

Since everyone wants to learn RL for language models now post DeepSeek, reminder that I've been working on this book quietly in the background for months. 

Policy gradient chapter is coming together. Plugging away at the book every day now.

rlhfbook dot com
Hossein Mobahi (@thegradient) 's Twitter Profile Photo

(1/2) Ever wondered why Sharpness-Aware Minimization (SAM) yields greater generalization gains in vision than in NLP? I'll discuss this at UCLA CS-201 seminar February 18th, relating it to the balance of SAM's impact on logit statistics vs model geometry. cs.ucla.edu/upcoming-event…

Thang Luong (@lmthang) 's Twitter Profile Photo

Excited to share details of AlphaGeometry2 (AG2), part of the system that achieved silver-medal standard at IMO 2024 last July! AG2 now has surpassed the average gold-medalist in solving Olympiad geometry problems, achieving a solving rate of 84% for all IMO geometry problems

Excited to share details of AlphaGeometry2 (AG2), part of the system that achieved silver-medal standard at IMO 2024 last July! AG2 now has surpassed the average gold-medalist in solving Olympiad geometry problems, achieving a solving rate of 84% for all IMO geometry problems
Yihe Deng (@yihe__deng) 's Twitter Profile Photo

New paper & model release! Excited to introduce DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails, showcasing our new DuoGuard-0.5B model. - Model: huggingface.co/DuoGuard/DuoGu… - Paper: arxiv.org/abs/2502.05163 - GitHub: github.com/yihedeng9/DuoG… Grounded in a

New paper & model release!

Excited to introduce DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails, showcasing our new DuoGuard-0.5B model.

- Model: huggingface.co/DuoGuard/DuoGu…
- Paper: arxiv.org/abs/2502.05163
- GitHub: github.com/yihedeng9/DuoG…

Grounded in a
Duy Nguyen (@duynguyen772) 's Twitter Profile Photo

LLMs must be helpful, unbiased, etc... but optimizing for one attribute can hurt others. 🚀 We introduce MAT-Steer for steering LLMs across multiple attributes w/out retraining! ✅ Beats best ITI baselines (+3% QA acc, 55.82% GPT-4 win rate) ✅ Matches LoRA with <20% data 🎯

LLMs must be helpful, unbiased, etc... but optimizing for one attribute can hurt others.

🚀 We introduce MAT-Steer for steering LLMs across multiple attributes w/out retraining!

✅ Beats best ITI baselines (+3% QA acc, 55.82% GPT-4 win rate)
✅ Matches LoRA with &lt;20% data

🎯
Yihe Deng (@yihe__deng) 's Twitter Profile Photo

🤖 I just updated my repository of RL(HF) summary notes to include a growing exploration of new topics, specifically adding notes to projects related to DeepSeek R1 reasoning. Take a look: github.com/yihedeng9/rlhf… 🚀 I’m hoping these summaries are helpful, and I’d love to hear

🤖 I just updated my repository of RL(HF) summary notes to include a growing exploration of new topics, specifically adding notes to projects related to DeepSeek R1 reasoning. 

Take a look: github.com/yihedeng9/rlhf… 🚀

I’m hoping these summaries are helpful, and I’d love to hear
Yihe Deng (@yihe__deng) 's Twitter Profile Photo

🚀Excited to share our latest work: OpenVLThinker, an exploration into enhancing vision-language models with R1 reasoning capabilities. By iterative integration of SFT and RL, we enabled LVLMs to exhibit robust R1 reasoning behavior. As a result, OpenVLThinker achieves a 70.2%

🚀Excited to share our latest work: OpenVLThinker, an exploration into enhancing vision-language models with R1 reasoning capabilities. 

By iterative integration of SFT and RL, we enabled LVLMs to exhibit robust R1 reasoning behavior. As a result, OpenVLThinker achieves a 70.2%
Dang Nguyen (@dangnth97) 's Twitter Profile Photo

🎉 Achievement unlocked: having papers with all of my labmates and somehow all ended up at ICLR! I’ll be presenting our work “Mini-batch Coresets for Memory-efficient Language Model Training on Data Mixtures” at #ICLR2025 🇸🇬 Come by and chat! 👋 on Fri, Apr 25 | 10 AM GMT+8

Siddharth Joshi (@sjoshi804) 's Twitter Profile Photo

#ICLR2025 Can you pre-train deep models with small, synthetic datasets? 🤯 We introduce the first effective dataset distillation method for self-supervised learning (SSL) — boosting downstream accuracy by up to 13% over baselines. 🧪 Poster #307, Sat Apr 26, 9am

Xuandong Zhao (@xuandongzhao) 's Twitter Profile Photo

🚀 Excited to share the most inspiring work I’ve been part of this year: "Learning to Reason without External Rewards" TL;DR: We show that LLMs can learn complex reasoning without access to ground-truth answers, simply by optimizing their own internal sense of confidence. 1/n

🚀 Excited to share the most inspiring work I’ve been part of this year:
 
"Learning to Reason without External Rewards"

TL;DR: We show that LLMs can learn complex reasoning without access to ground-truth answers, simply by optimizing their own internal sense of confidence. 1/n
Yihao Xue (@xue_yihao65785) 's Twitter Profile Photo

🎉 Our paper “Representations Shape Weak-to-Strong Generalization” is accepted at #ICML2025! We study weak-to-strong generalization (W2SG)—a core problem in superalignment—and offer new insights into the role of models' internal representations in W2SG. 1/

🎉 Our paper “Representations Shape Weak-to-Strong Generalization” is accepted at #ICML2025!
We study weak-to-strong generalization (W2SG)—a core problem in superalignment—and offer new insights into the role of models' internal representations in W2SG.
1/
Chuong M. Huynh (@ryanhuynh1108) 's Twitter Profile Photo

CVPR-bound! ✈️ I'll be presenting CoLLM on Friday, 6/13 (Morning, #364) and looking for my next challenge as a full-time Scientist/Engineer. If you're hiring or just want to chat about exciting research, find me there! My work: hmchuong.github.io #CVPR2025 #JobHunt

Nouha Dziri (@nouhadziri) 's Twitter Profile Photo

📢 Can LLMs really reason outside the box in math? Or are they just remixing familiar strategies? Remember DeepSeek R1, o1 have impressed us on Olympiad-level math but also they were failing at simple arithmetic 😬 We built a benchmark to find out → OMEGA Ω 📐 💥 We found

📢 Can LLMs really reason outside the box in math? Or are they just remixing familiar strategies? 

Remember DeepSeek R1, o1 have impressed us on Olympiad-level math but also they were failing at simple arithmetic 😬

 We built a benchmark to find out → OMEGA Ω 📐

💥 We found
Tung Nguyen (@tungnd_13) 's Twitter Profile Photo

🚀 Introducing PhysiX: One of the first large-scale foundation models for physics simulations! PhysiX is a 4.5B parameter model that unifies a wide range of physical systems, from fluid dynamics to reaction-diffusion, outperforming specialized, state-of-the-art models.

Thang Luong (@lmthang) 's Twitter Profile Photo

Very excited to share that an advanced version of Gemini Deep Think is the first to have achieved gold-medal level in the International Mathematical Olympiad! 🏆, solving five out of six problems perfectly, as verified by the IMO organizers! It’s been a wild run to lead this

Very excited to share that an advanced version of Gemini Deep Think is the first to have achieved gold-medal level in the International Mathematical Olympiad! 🏆, solving five out of six problems perfectly, as verified by the IMO organizers! It’s been a wild run to lead this
Lin Yang (@lyang36) 's Twitter Profile Photo

🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025

Pratyush Maini (@pratyushmaini) 's Twitter Profile Photo

1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today DatologyAI shares BeyondWeb, our synthetic data approach & all the learnings from scaling it to trillions of tokens🧑🏼‍🍳 - 3B LLMs beat 8B models🚀 - Pareto frontier for performance

1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today <a href="/datologyai/">DatologyAI</a> shares BeyondWeb, our synthetic data approach &amp; all the learnings from scaling it to trillions of tokens🧑🏼‍🍳
- 3B LLMs beat 8B models🚀
- Pareto frontier for performance
Jason Weston (@jaseweston) 's Twitter Profile Photo

🤖Introducing OptimalThinkingBench 🤖 📝: arxiv.org/abs/2508.13141 - Thinking LLMs use a lot of tokens & overthink; non-thinking LLMs underthink & underperform. - We introduce a benchmark which scores models in the quest to find the best mix. - OptimalThinkingBench reports the F1

🤖Introducing OptimalThinkingBench 🤖
📝: arxiv.org/abs/2508.13141
- Thinking LLMs use a lot of tokens &amp; overthink; non-thinking LLMs underthink &amp; underperform.
- We introduce a benchmark which scores models in the quest to find the best mix.
- OptimalThinkingBench reports the F1