Subham Sahoo (@ssahoo_) 's Twitter Profile
Subham Sahoo

@ssahoo_

PhD candidate @cornell working on Diffusion Language Models. Previously @GoogleAI, @IITKgp.

ID: 155813173

linkhttps://s-sahoo.com calendar_today15-06-2010 06:38:32

99 Tweet

182 Followers

111 Following

Jinjie Ni @ ICLR'25 🇸🇬 (@nijinjie) 's Twitter Profile Photo

🍷Imagine you are the boss of Google DeepMind. To train the best diffusion language model in world within 1 year, using 800 TPU pods, which model size will you go for? 🐿️ We build Quokka to help you decide–the first-ever large-scale scaling law for DLMs. Interesting facts: 1.

🍷Imagine you are the boss of Google DeepMind.

To train the best diffusion language model in world within 1 year, using 800 TPU pods, which model size will you go for?

🐿️ We build Quokka to help you decide–the first-ever large-scale scaling law for DLMs.

Interesting facts:

1.
Subham Sahoo (@ssahoo_) 's Twitter Profile Photo

🎓 Officially a doctor now 😊!!! As a first-gen college kid, this moment means the world to me. Grateful beyond words to all my mentors who’ve guided me along the way — from Georg Martius who first introduced me to research back in 2017, to Volodymyr Kuleshov 🇺🇦 who sparked my love for

🎓 Officially a doctor now 😊!!!

As a first-gen college kid, this moment means the world to me. 

Grateful beyond words to all my mentors who’ve guided me along the way — from <a href="/GMartius/">Georg Martius</a> who first introduced me to research back in 2017, to <a href="/volokuleshov/">Volodymyr Kuleshov 🇺🇦</a> who sparked my love for
Justin Deschenaux (@jdeschena) 's Twitter Profile Photo

✨ Masked Generative Models (MGMs) are powerful and can generate tokens in parallel. They’ve driven impressive results across text and images and are increasingly competitive with autoregressive (AR) models. Thrilled to share our latest work to accelerate MGMs (1/12) 🧵

Subham Sahoo (@ssahoo_) 's Twitter Profile Photo

Impressive work by Justin Deschenaux ! They propose to replace the Encoder only denoising transformer with an Encoder-Decoder architecture which leads to faster training and inference of MDLM.

Zachary Horvitz (@zachary_horvitz) 's Twitter Profile Photo

✨Masked Diffusion Language Models✨ are great for reasoning, but not just for the reasons you think! Fast parallel decoding? 🤔 Any-order decoding? 🤨 Plot twist: MDLMs offer A LOT MORE for inference and post-training! 🎢🧵

✨Masked Diffusion Language Models✨ are great for reasoning, but not just for the reasons you think!
 
Fast parallel decoding? 🤔 Any-order decoding? 🤨

Plot twist: MDLMs offer A LOT MORE for inference and post-training! 🎢🧵
The Discrete Diffusion Reading Group (@diffusionllms) 's Twitter Profile Photo

Drowning in the sea of Discrete Diffusion papers? 🌊 We got you. Join our Reading Group! From theory → empirics, and language → molecules — we’ll decode the chaos together 💫 Join the cult—uh, I mean community 😇 👉 Google Group:  groups.google.com/g/diffusion-ll… (1 / 2)

Drowning in the sea of Discrete Diffusion papers? 🌊
We got you.

Join our Reading Group!

From theory → empirics, and language → molecules — we’ll decode the chaos together 💫

Join the cult—uh, I mean community 😇
👉 Google Group:  groups.google.com/g/diffusion-ll…

(1 / 2)
Subham Sahoo (@ssahoo_) 's Twitter Profile Photo

Overwhelmed by the number of Diffusion LLM papers? 🌊 Same here 😭 So I’m starting a Discrete Diffusion Reading Group (Discrete Diffusion Reading Group) with my favorite disciples Justin Deschenaux and Zhihan Yang ✨ We’ll cover everything—from theory to empirics, from language to molecules. Join

Overwhelmed by the number of Diffusion LLM papers? 🌊
Same here 😭

So I’m starting a Discrete Diffusion Reading Group (<a href="/diffusion_llms/">Discrete Diffusion Reading Group</a>) with my favorite disciples <a href="/jdeschena/">Justin Deschenaux</a>  and <a href="/zhihanyang_/">Zhihan Yang</a>  ✨

We’ll cover everything—from theory to empirics, from language to molecules.

Join
Subham Sahoo (@ssahoo_) 's Twitter Profile Photo

We’re building a space that connects researchers, students, and practitioners working on discrete diffusion. Join the Discord — collaborate, learn, and share! Whether you’re 💼hiring or showcasing your work, this is the place 👇 Discord: discord.gg/JxSCwpNb

Subham Sahoo (@ssahoo_) 's Twitter Profile Photo

The term AGI gives me the same ick that “AI” did back in 2015. If it takes hundreds of billions of tokens just to get a respectable score on grade school math (GSM8K), that says everything about where we actually are.