Gaurav Ghosal (@gaurav_ghosal) Twitter Tweets • TwiCopy

Yiding Jiang

4 months ago

Abitha will be presenting our work on training language models to predict further into the future beyond the next token and the benefits this objective brings. x.com/gm8xx8/status/…

thumb_up_off_alt18

chat_bubble_outline0

repeat5

shareShare

In "Mind Your Step (by Step): Chain‑of‑Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse", we connect human "overthinking" insights to LLM reasoning, offering a new lens on when thinking‑out‑loud backfires. 📄 Read the full paper: arxiv.org/abs/2410.21333

thumb_up_off_alt83

chat_bubble_outline1

repeat11

shareShare

Ziqian Zhong

@fjzzq2002

4 months ago

🤖 Some company just released a new set of open-weight LLMs well-suited for your production environment. However, you suspect that the models might be trained with backdoors or other hidden malicious behaviors. Is it still possible to deploy these models worry-free? (1/7)

thumb_up_off_alt48

chat_bubble_outline3

repeat22

shareShare

Ravid Shwartz Ziv

@ziv_ravid

2 months ago

The new OpenAI paper “Why Language Models Hallucinate” is more like PR than research. The claim that hallucinations arise because training/evaluation reward guessing over abstaining is decades-old (reject option classifiers, selective prediction).

thumb_up_off_alt678

chat_bubble_outline19

repeat54

shareShare

Sachin Goyal

@goyalsachin007

2 months ago

1/Excited to share the first in a series of my research updates on LLM pretraining🚀. Our new work shows *distilled pretraining*—increasingly used to train deployable models—has trade-offs: ✅ Boosts test-time scaling ⚠️ Weakens in-context learning ✨ Needs tailored data curation

thumb_up_off_alt328

chat_bubble_outline5

repeat64

shareShare

Sachin Goyal

@goyalsachin007

2 months ago

🚨 Super excited to finally share our Safety Pretraining work — along with all the artifacts (safe data, models, code)! In this thread 🧵, I’ll walk through our journey — the key intermediate observations and lessons, and how they helped shape our final pipeline.

thumb_up_off_alt64

chat_bubble_outline3

repeat12

shareShare

Teachable Machine

@teachableai

2 months ago

Researchers are working on ways to prevent large language models (LLMs) from simply memorizing information instead of truly learning. They found that removing memorized parts directly can harm the model's ability to learn new things. Their solution, called MemSinks, creates

thumb_up_off_alt0

chat_bubble_outline0

repeat1

shareShare

Aditi Raghunathan

@adtraghunathan

2 months ago

There’s been a lot of work on unlearning in LLMs, trying to erase memorization without hurting capabilities — but we haven’t seen much success. ❓What if unlearning is actually doomed from the start? 👇This thread explains why and how *memorization sinks* offer a new way forward.

thumb_up_off_alt173

chat_bubble_outline6

repeat35

shareShare

Gaurav Ghosal

@gaurav_ghosal

2 months ago

This was a very fun project and we are really excited to keep working along this direction!

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Pratyush Maini

@pratyushmaini

2 months ago

One thing years of memorization research has made clear: unlearning is fundamentally hard. Neurons are polysemantic & concepts are massively distributed. There’s no clean 'delete'. We need architectures that are "unlearnable by design". Introducing, Memorization Sinks 🛁⬇️

thumb_up_off_alt183

chat_bubble_outline2

repeat16

shareShare

Sachin Goyal

@goyalsachin007

2 months ago

I had early sneak peeks into this exciting work on rethinking pretraining—credits to Gaurav Ghosal, my constant buddy through countless late nights at CMU. It’s been a blast building pretraining frameworks and sharing insights. Gaurav Ghosal’s energy is absolutely unmatched!

thumb_up_off_alt10

chat_bubble_outline0

repeat1

shareShare

Suhas Kotha

@kothasuhas

2 months ago

Since compute grows faster than the web, we think the future of pre-training lies in the algorithms that will best leverage ♾ compute We find simple recipes that improve the asymptote of compute scaling laws to be 5x data efficient, offering better perf w/ sufficient compute

thumb_up_off_alt430

chat_bubble_outline9

repeat78

shareShare

Aditi Raghunathan

@adtraghunathan

2 months ago

I had the chance to join the TWIML podcast to talk about my group’s ICML 2025 papers! We dug into the surprising limitations of modern pre-training: where it breaks down, why it matters, and what new directions might help us move past these barriers.

thumb_up_off_alt64

chat_bubble_outline0

repeat3

shareShare

Zitong Yang

@zitongyang0

2 months ago

📜 Paper on new pretraining paradigm: Synthetic Bootstrapped Pretraining SBP goes beyond next-token supervision in a single document by leveraging inter-document correlations to synthesize new data for training — no teacher needed. Validation: 1T data + 3B model from scratch.🧵

thumb_up_off_alt241

chat_bubble_outline9

repeat47

shareShare

Gaurav Ghosal

Yiding Jiang

Jiayi Geng

Ziqian Zhong

Ravid Shwartz Ziv

Sachin Goyal

Sachin Goyal

Teachable Machine

Aditi Raghunathan

Gaurav Ghosal

Pratyush Maini

Sachin Goyal

Suhas Kotha

Aditi Raghunathan

Zitong Yang