Vineet Jain (@thevineetjain) 's Twitter Profile
Vineet Jain

@thevineetjain

PhD candidate @Mila_Quebec and @mcgillu. Previously @mldcmu @Bosch_AI

ID: 1430350595106476037

linkhttps://vineetjain96.github.io/ calendar_today25-08-2021 02:08:07

94 Tweet

505 Takipçi

372 Takip Edilen

Aniket Didolkar (@aniket_d98) 's Twitter Profile Photo

🚨Reasoning LLMs are e̵f̵f̵e̵c̵t̵i̵v̵e̵ ̵y̵e̵t̵ inefficient! Large language models (LLMs) now solve multi-step problems by emitting extended chains of thought. During the process, they often re-derive the same intermediate steps across problems, inflating token usage and

Johan S. Obando 👍🏽 (@johanobandoc) 's Twitter Profile Photo

🚨Super excited to share that our paper “Stable Gradients for Stable Learning at Scale in DRL” was accepted at #NeurIPS as a Spotlight! ✨🌟 Paper: arxiv.org/abs/2506.15544 Huge thanks to my amazing coauthors. I’ve learned so much from each of you. 🫶 Onward! 🚀 #NeurIPS2025

🚨Super excited to share that our paper “Stable Gradients for Stable Learning at Scale in DRL” was accepted at #NeurIPS as a Spotlight! ✨🌟

Paper: arxiv.org/abs/2506.15544

Huge thanks to my amazing coauthors. I’ve learned so much from each of you. 🫶

Onward! 🚀 #NeurIPS2025
Aniket Didolkar (@aniket_d98) 's Twitter Profile Photo

Our work (arxiv.org/abs/2509.13237) can be seen as one instantiation of the paradigm proposed by Andrej Karpathy here. The behavior handbook is a repository of problem solving strategies which we show can be reused to get better and more efficient reasoning in the future. 🧵 for more

Vineet Jain (@thevineetjain) 's Twitter Profile Photo

Your favorite library for training LLMs with RL (most likely) implements the objective incorrectly. I don’t know how this happened, and why this is so prevalent. We have a preprint coming out soon that will set the record straight. And yes, unbiased gradients do matter!

Vineet Jain (@thevineetjain) 's Twitter Profile Photo

RSA is now on arxiv! 🥳 Check out these beautiful animations made by Moksh Jain which explain the basic idea really well. 📄 arxiv.org/abs/2509.26626

Vineet Jain (@thevineetjain) 's Twitter Profile Photo

Thanks for sharing our work Rohan Paul I was honestly surprised to see this simple method works so well across tasks and models. There is definitely a lot more we can get out of models using test time scaling alone.

Sarthak Mittal (@sarthmit) 's Twitter Profile Photo

Exciting to see Qwen adopt RSA (arxiv.org/abs/2509.26626) as their test-time inference method! Note: maintaining a good population size across aggregation is important!

Exciting to see <a href="/Alibaba_Qwen/">Qwen</a> adopt RSA (arxiv.org/abs/2509.26626) as their test-time inference method! Note: maintaining a good population size across aggregation is important!
Aniket Didolkar (@aniket_d98) 's Twitter Profile Photo

LLMs can also come up with their own skills and reuse them to improve capabilities across various tasks. We explore this in two papers: 📜 - arxiv.org/abs/2405.12205 🧵 - x.com/prfsanjeevaror… 📜 - arxiv.org/abs/2509.13237 🧵 - x.com/Aniket_d98/sta…

LLMs can also come up with their own skills and reuse them to improve capabilities across various tasks.

We explore this in two papers:

📜 - arxiv.org/abs/2405.12205
🧵 - x.com/prfsanjeevaror…

📜 - arxiv.org/abs/2509.13237
🧵 - x.com/Aniket_d98/sta…
Divyat Mahajan (@divyat09) 's Twitter Profile Photo

[1/9] While pretraining data might be hitting a wall, novel methods for modeling it are just getting started! We introduce future summary prediction (FSP), where the model predicts future sequence embeddings to reduce teacher forcing & shortcut learning. 📌Predict a learned