Sulin Liu (@su_lin_liu) 's Twitter Profile
Sulin Liu

@su_lin_liu

Postdoc @MIT Ex: Machine Learning PhD @Princeton @Meta @NTUsg @NUSingapore

ID: 264576956

linkhttps://liusulin.github.io/ calendar_today12-03-2011 03:48:27

216 Tweet

592 Followers

1,1K Following

Lin Zheng (@linzhengisme) 's Twitter Profile Photo

🚀 Meet EvaByte: The best open-source tokenizer-free language model! Our 6.5B byte LM matches modern tokenizer-based LMs with 5x less data & 2x faster decoding, naturally extending to multimodal tasks while fixing tokenization quirks. 💻 Blog: bit.ly/3CjEmTC 🧵 1/9

🚀 Meet EvaByte: The best open-source tokenizer-free language model! Our 6.5B byte LM matches modern tokenizer-based LMs with 5x less data & 2x faster decoding, naturally extending to multimodal tasks while fixing tokenization quirks.

💻 Blog: bit.ly/3CjEmTC 

🧵 1/9
Sulin Liu (@su_lin_liu) 's Twitter Profile Photo

Check out new paper on how to do planning for discrete diffusion 👏 Really exciting to see more exploration in this direction🔥

Sitan Chen (@sitanch) 's Twitter Profile Photo

Excited about this new work where we dig into the role of token order in masked diffusions! MDMs train on some horribly hard tasks, but careful planning at inference can sidestep the hardest ones, dramatically improving over vanilla MDM sampling (e.g. 7%->90% acc on Sudoku) 1/

Excited about this new work where we dig into the role of token order in masked diffusions!
MDMs train on some horribly hard tasks, but careful planning at inference can sidestep the hardest ones, dramatically improving over vanilla MDM sampling (e.g. 7%->90% acc on Sudoku)  1/
Ji-Ha (@ji_ha_kim) 's Twitter Profile Photo

I can’t begin to imagine how strong Anthropic’s internal models must be, since Claude was by far the strongest of the standard non-reasoning models: it’s the only one who could escape getting stuck in loops, a recurrent problem that every other LLM has not overcome

Sulin Liu (@su_lin_liu) 's Twitter Profile Photo

Discrete diffusion (including masked language model) deserves more investment in terms of research and compute, especially when we are running out of pre-training data for autoregressive LLMs. You can get a lot more data for free by just masking data or perturbing them with

Sulin Liu (@su_lin_liu) 's Twitter Profile Photo

grok also tends to do more solution verification at the end of the solution than chatgpt. Clearly this cannot be baked in through just RL from verifiable reward...

Stefano Ermon (@stefanoermon) 's Twitter Profile Photo

Excited to share that I’ve been working on scaling up diffusion language models at Inception. A new generation of LLMs with unprecedented capabilities is coming!

David Duvenaud (@davidduvenaud) 's Twitter Profile Photo

LLMs have complex joint beliefs about all sorts of quantities. And my postdoc James Requeima visualized them! In this thread we show LLM predictive distributions conditioned on data and free-form text. LLMs pick up on all kinds of subtle and unusual structure: 🧵

Federico Cassano (@ellev3n11) 's Twitter Profile Photo

i think that all the pre-training is dead takes are bad. the issue with these big big models is that they are capped by dogwater human-labeled post-training data. we shall continue to scale by exploiting verified RL. excited to see gpt-4.5 be used as a base for the next o model.

Kenny Peng (@kennylpeng) 's Twitter Profile Photo

Our lab had a #dogathon 🐕 yesterday where we analyzed NYC Open Data on dog licenses. We learned a lot of dog facts, which I’ll share in this thread 🧵 1) Geospatial trends: Cavalier King Charles Spaniels are common in Manhattan; the opposite is true for Yorkshire Terriers.

Our lab had a #dogathon 🐕 yesterday where we analyzed NYC Open Data on dog licenses. We learned a lot of dog facts, which I’ll share in this thread 🧵  

1) Geospatial trends: Cavalier King Charles Spaniels are common in Manhattan; the opposite is true for Yorkshire Terriers.