Dang Nguyen (@dangnth97) Twitter Tweets • TwiCopy

Ali Behrouz

a year ago

Attention has been the key component for most advances in LLMs, but it can’t scale to long context. Does this mean we need to find an alternative? Presenting Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time. Titans

thumb_up_off_alt3,3K

chat_bubble_outline78

repeat609

shareShare

Wenhu Chen

@wenhuchen

10 months ago

Everyone is talking about RL these days. But are we done with SFT? The answer is NO. If we revive SFT in another form, it can even beat RL! Very happy to introduce Critique Fine-Tuning, a new form of SFT, which can more efficiently activate language models' reasoning

thumb_up_off_alt700

chat_bubble_outline23

repeat97

shareShare

Nathan Lambert

@natolambert

10 months ago

Since everyone wants to learn RL for language models now post DeepSeek, reminder that I've been working on this book quietly in the background for months. Policy gradient chapter is coming together. Plugging away at the book every day now. rlhfbook dot com

thumb_up_off_alt1,1K

chat_bubble_outline22

repeat168

shareShare

Hossein Mobahi

@thegradient

10 months ago

(1/2) Ever wondered why Sharpness-Aware Minimization (SAM) yields greater generalization gains in vision than in NLP? I'll discuss this at UCLA CS-201 seminar February 18th, relating it to the balance of SAM's impact on logit statistics vs model geometry. cs.ucla.edu/upcoming-event…

thumb_up_off_alt59

chat_bubble_outline1

repeat10

shareShare

Thang Luong

@lmthang

10 months ago

Excited to share details of AlphaGeometry2 (AG2), part of the system that achieved silver-medal standard at IMO 2024 last July! AG2 now has surpassed the average gold-medalist in solving Olympiad geometry problems, achieving a solving rate of 84% for all IMO geometry problems

thumb_up_off_alt991

chat_bubble_outline31

repeat179

shareShare

Yihe Deng

@yihe__deng

10 months ago

New paper & model release! Excited to introduce DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails, showcasing our new DuoGuard-0.5B model. - Model: huggingface.co/DuoGuard/DuoGu… - Paper: arxiv.org/abs/2502.05163 - GitHub: github.com/yihedeng9/DuoG… Grounded in a

thumb_up_off_alt133

chat_bubble_outline2

repeat30

shareShare

Duy Nguyen

@duynguyen772

10 months ago

LLMs must be helpful, unbiased, etc... but optimizing for one attribute can hurt others. 🚀 We introduce MAT-Steer for steering LLMs across multiple attributes w/out retraining! ✅ Beats best ITI baselines (+3% QA acc, 55.82% GPT-4 win rate) ✅ Matches LoRA with <20% data 🎯

thumb_up_off_alt59

chat_bubble_outline5

repeat29

shareShare

Yihe Deng

@yihe__deng

9 months ago

🤖 I just updated my repository of RL(HF) summary notes to include a growing exploration of new topics, specifically adding notes to projects related to DeepSeek R1 reasoning. Take a look: github.com/yihedeng9/rlhf… 🚀 I’m hoping these summaries are helpful, and I’d love to hear

thumb_up_off_alt100

chat_bubble_outline1

repeat12

shareShare

Yihe Deng

@yihe__deng

9 months ago

🚀Excited to share our latest work: OpenVLThinker, an exploration into enhancing vision-language models with R1 reasoning capabilities. By iterative integration of SFT and RL, we enabled LVLMs to exhibit robust R1 reasoning behavior. As a result, OpenVLThinker achieves a 70.2%

thumb_up_off_alt170

chat_bubble_outline3

repeat37

shareShare

Dang Nguyen

@dangnth97

7 months ago

🎉 Achievement unlocked: having papers with all of my labmates and somehow all ended up at ICLR! I’ll be presenting our work “Mini-batch Coresets for Memory-efficient Language Model Training on Data Mixtures” at #ICLR2025 🇸🇬 Come by and chat! 👋 on Fri, Apr 25 | 10 AM GMT+8

thumb_up_off_alt28

chat_bubble_outline0

repeat2

shareShare

Siddharth Joshi

@sjoshi804

7 months ago

#ICLR2025 Can you pre-train deep models with small, synthetic datasets? 🤯 We introduce the first effective dataset distillation method for self-supervised learning (SSL) — boosting downstream accuracy by up to 13% over baselines. 🧪 Poster #307, Sat Apr 26, 9am

thumb_up_off_alt28

chat_bubble_outline1

repeat2

shareShare

Xuandong Zhao

@xuandongzhao

6 months ago

🚀 Excited to share the most inspiring work I’ve been part of this year: "Learning to Reason without External Rewards" TL;DR: We show that LLMs can learn complex reasoning without access to ground-truth answers, simply by optimizing their own internal sense of confidence. 1/n

thumb_up_off_alt3,3K

chat_bubble_outline81

repeat505

shareShare

Yihao Xue

@xue_yihao65785

6 months ago

🎉 Our paper “Representations Shape Weak-to-Strong Generalization” is accepted at #ICML2025! We study weak-to-strong generalization (W2SG)—a core problem in superalignment—and offer new insights into the role of models' internal representations in W2SG. 1/

thumb_up_off_alt26

chat_bubble_outline1

repeat8

shareShare

Chuong M. Huynh

@ryanhuynh1108

6 months ago

CVPR-bound! ✈️ I'll be presenting CoLLM on Friday, 6/13 (Morning, #364) and looking for my next challenge as a full-time Scientist/Engineer. If you're hiring or just want to chat about exciting research, find me there! My work: hmchuong.github.io #CVPR2025 #JobHunt

thumb_up_off_alt22

chat_bubble_outline1

repeat6

shareShare

Nouha Dziri

@nouhadziri

5 months ago

📢 Can LLMs really reason outside the box in math? Or are they just remixing familiar strategies? Remember DeepSeek R1, o1 have impressed us on Olympiad-level math but also they were failing at simple arithmetic 😬 We built a benchmark to find out → OMEGA Ω 📐 💥 We found

thumb_up_off_alt714

chat_bubble_outline22

repeat157

shareShare

Tung Nguyen

@tungnd_13

5 months ago

🚀 Introducing PhysiX: One of the first large-scale foundation models for physics simulations! PhysiX is a 4.5B parameter model that unifies a wide range of physical systems, from fluid dynamics to reaction-diffusion, outperforming specialized, state-of-the-art models.

thumb_up_off_alt1,1K

chat_bubble_outline23

repeat260

shareShare

Thang Luong

@lmthang

5 months ago

Very excited to share that an advanced version of Gemini Deep Think is the first to have achieved gold-medal level in the International Mathematical Olympiad! 🏆, solving five out of six problems perfectly, as verified by the IMO organizers! It’s been a wild run to lead this

thumb_up_off_alt1,1K

chat_bubble_outline75

repeat224

shareShare

Lin Yang

@lyang36

4 months ago

🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025

thumb_up_off_alt1,1K

chat_bubble_outline59

repeat118

shareShare

Pratyush Maini

@pratyushmaini

4 months ago

1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today DatologyAI shares BeyondWeb, our synthetic data approach & all the learnings from scaling it to trillions of tokens🧑🏼‍🍳 - 3B LLMs beat 8B models🚀 - Pareto frontier for performance

1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today <a href="/datologyai/">DatologyAI</a> shares BeyondWeb, our synthetic data approach & all the learnings from scaling it to trillions of tokens🧑🏼‍🍳
- 3B LLMs beat 8B models🚀
- Pareto frontier for performance

thumb_up_off_alt559

chat_bubble_outline18

repeat92

shareShare

Jason Weston

@jaseweston

4 months ago

🤖Introducing OptimalThinkingBench 🤖 📝: arxiv.org/abs/2508.13141 - Thinking LLMs use a lot of tokens & overthink; non-thinking LLMs underthink & underperform. - We introduce a benchmark which scores models in the quest to find the best mix. - OptimalThinkingBench reports the F1

thumb_up_off_alt417

chat_bubble_outline1

repeat68

shareShare