Gaurav Ghosal (@gaurav_ghosal) 's Twitter Profile
Gaurav Ghosal

@gaurav_ghosal

Ph.D. Student @mldcmu | Former Undergraduate Student @berkeley_eecs and Researcher @berkeley_ai |

ID: 1618706956885331969

calendar_today26-01-2023 20:26:38

51 Tweet

170 Followers

175 Following

Yiding Jiang (@yidingjiang) 's Twitter Profile Photo

Abitha will be presenting our work on training language models to predict further into the future beyond the next token and the benefits this objective brings. x.com/gm8xx8/status/…

Jiayi Geng (@jiayiigeng) 's Twitter Profile Photo

In "Mind Your Step (by Step): Chain‑of‑Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse", we connect human "overthinking" insights to LLM reasoning, offering a new lens on when thinking‑out‑loud backfires. 📄 Read the full paper: arxiv.org/abs/2410.21333

Ziqian Zhong (@fjzzq2002) 's Twitter Profile Photo

🤖 Some company just released a new set of open-weight LLMs well-suited for your production environment. However, you suspect that the models might be trained with backdoors or other hidden malicious behaviors. Is it still possible to deploy these models worry-free? (1/7)

🤖 Some company just released a new set of open-weight LLMs well-suited for your production environment. However, you suspect that the models might be trained with backdoors or other hidden malicious behaviors. Is it still possible to deploy these models worry-free? (1/7)
Ravid Shwartz Ziv (@ziv_ravid) 's Twitter Profile Photo

The new OpenAI paper “Why Language Models Hallucinate” is more like PR than research. The claim that hallucinations arise because training/evaluation reward guessing over abstaining is decades-old (reject option classifiers, selective prediction).

Sachin Goyal (@goyalsachin007) 's Twitter Profile Photo

1/Excited to share the first in a series of my research updates on LLM pretraining🚀. Our new work shows *distilled pretraining*—increasingly used to train deployable models—has trade-offs: ✅ Boosts test-time scaling ⚠️ Weakens in-context learning ✨ Needs tailored data curation

1/Excited to share the first in a series of my research updates on LLM pretraining🚀.
Our new work shows *distilled pretraining*—increasingly used to train deployable models—has trade-offs:
✅ Boosts test-time scaling
⚠️ Weakens in-context learning
✨ Needs tailored data curation
Sachin Goyal (@goyalsachin007) 's Twitter Profile Photo

🚨 Super excited to finally share our Safety Pretraining work — along with all the artifacts (safe data, models, code)! In this thread 🧵, I’ll walk through our journey — the key intermediate observations and lessons, and how they helped shape our final pipeline.

Teachable Machine (@teachableai) 's Twitter Profile Photo

Researchers are working on ways to prevent large language models (LLMs) from simply memorizing information instead of truly learning. They found that removing memorized parts directly can harm the model's ability to learn new things. Their solution, called MemSinks, creates

Aditi Raghunathan (@adtraghunathan) 's Twitter Profile Photo

There’s been a lot of work on unlearning in LLMs, trying to erase memorization without hurting capabilities — but we haven’t seen much success. ❓What if unlearning is actually doomed from the start? 👇This thread explains why and how *memorization sinks* offer a new way forward.

Pratyush Maini (@pratyushmaini) 's Twitter Profile Photo

One thing years of memorization research has made clear: unlearning is fundamentally hard. Neurons are polysemantic & concepts are massively distributed. There’s no clean 'delete'. We need architectures that are "unlearnable by design". Introducing, Memorization Sinks 🛁⬇️

Sachin Goyal (@goyalsachin007) 's Twitter Profile Photo

I had early sneak peeks into this exciting work on rethinking pretraining—credits to Gaurav Ghosal, my constant buddy through countless late nights at CMU. It’s been a blast building pretraining frameworks and sharing insights. Gaurav Ghosal’s energy is absolutely unmatched!

Suhas Kotha (@kothasuhas) 's Twitter Profile Photo

Since compute grows faster than the web, we think the future of pre-training lies in the algorithms that will best leverage ♾ compute We find simple recipes that improve the asymptote of compute scaling laws to be 5x data efficient, offering better perf w/ sufficient compute

Since compute grows faster than the web, we think the future of pre-training lies in the algorithms that will best leverage ♾ compute

We find simple recipes that improve the asymptote of compute scaling laws to be 5x data efficient, offering better perf w/ sufficient compute
Aditi Raghunathan (@adtraghunathan) 's Twitter Profile Photo

I had the chance to join the TWIML podcast to talk about my group’s ICML 2025 papers! We dug into the surprising limitations of modern pre-training: where it breaks down, why it matters, and what new directions might help us move past these barriers.

Zitong Yang (@zitongyang0) 's Twitter Profile Photo

📜 Paper on new pretraining paradigm: Synthetic Bootstrapped Pretraining SBP goes beyond next-token supervision in a single document by leveraging inter-document correlations to synthesize new data for training — no teacher needed. Validation: 1T data + 3B model from scratch.🧵

📜 Paper on new pretraining paradigm: Synthetic Bootstrapped Pretraining

SBP goes beyond next-token supervision in a single document by leveraging inter-document correlations to synthesize new data for training — no teacher needed. Validation: 1T data + 3B model from scratch.🧵