Vineet Jain (@thevineetjain) Twitter Tweets • TwiCopy

Vineet Jain

@thevineetjain

+ Follow

PhD candidate @Mila_Quebec and @mcgillu. Previously @mldcmu @Bosch_AI

ID: 1430350595106476037

linkhttps://vineetjain96.github.io/ calendar_today25-08-2021 02:08:07

94 Tweet

505 Takipçi

372 Takip Edilen

good girl

@goodgirlxsz

5 hours ago

🔥Telegram İfşa

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

🚨Reasoning LLMs are e̵f̵f̵e̵c̵t̵i̵v̵e̵ ̵y̵e̵t̵ inefficient! Large language models (LLMs) now solve multi-step problems by emitting extended chains of thought. During the process, they often re-derive the same intermediate steps across problems, inflating token usage and

thumb_up_off_alt206

chat_bubble_outline4

repeat34

shareShare

Johan S. Obando 👍🏽

@johanobandoc

2 months ago

🚨Super excited to share that our paper “Stable Gradients for Stable Learning at Scale in DRL” was accepted at #NeurIPS as a Spotlight! ✨🌟 Paper: arxiv.org/abs/2506.15544 Huge thanks to my amazing coauthors. I’ve learned so much from each of you. 🫶 Onward! 🚀 #NeurIPS2025

thumb_up_off_alt89

chat_bubble_outline3

repeat22

shareShare

Aniket Didolkar

@aniket_d98

2 months ago

Our work (arxiv.org/abs/2509.13237) can be seen as one instantiation of the paradigm proposed by Andrej Karpathy here. The behavior handbook is a repository of problem solving strategies which we show can be reused to get better and more efficient reasoning in the future. 🧵 for more

thumb_up_off_alt20

chat_bubble_outline2

repeat4

shareShare

Arnav Jain

@arnavkj95

2 months ago

🚀We were delighted to anchor ⚓️ at the RoboPapers podcast with Chris Paxton and Michael Cho - Rbt/Acc to talk about SAILOR ⛵️. The episode is now live—tune in!

thumb_up_off_alt17

chat_bubble_outline0

repeat10

shareShare

Vineet Jain

@thevineetjain

2 months ago

Your favorite library for training LLMs with RL (most likely) implements the objective incorrectly. I don’t know how this happened, and why this is so prevalent. We have a preprint coming out soon that will set the record straight. And yes, unbiased gradients do matter!

thumb_up_off_alt15

chat_bubble_outline1

repeat0

shareShare

Vineet Jain

@thevineetjain

2 months ago

RSA is now on arxiv! 🥳 Check out these beautiful animations made by Moksh Jain which explain the basic idea really well. 📄 arxiv.org/abs/2509.26626

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Vineet Jain

@thevineetjain

2 months ago

Do people really think he won the Turing award for the bitter lesson essay? What about all of his foundational work on RL?

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Vineet Jain

@thevineetjain

2 months ago

Thanks for sharing our work Rohan Paul I was honestly surprised to see this simple method works so well across tasks and models. There is definitely a lot more we can get out of models using test time scaling alone.

thumb_up_off_alt10

chat_bubble_outline1

repeat1

shareShare

Sarthak Mittal

@sarthmit

a month ago

Exciting to see Qwen adopt RSA (arxiv.org/abs/2509.26626) as their test-time inference method! Note: maintaining a good population size across aggregation is important!

Exciting to see <a href="/Alibaba_Qwen/">Qwen</a> adopt RSA (arxiv.org/abs/2509.26626) as their test-time inference method! Note: maintaining a good population size across aggregation is important!

thumb_up_off_alt18

chat_bubble_outline1

repeat5

shareShare

Vineet Jain

@thevineetjain

a month ago

This is really cool

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

Aniket Didolkar

@aniket_d98

a month ago

LLMs can also come up with their own skills and reuse them to improve capabilities across various tasks. We explore this in two papers: 📜 - arxiv.org/abs/2405.12205 🧵 - x.com/prfsanjeevaror… 📜 - arxiv.org/abs/2509.13237 🧵 - x.com/Aniket_d98/sta…

thumb_up_off_alt23

chat_bubble_outline0

repeat7

shareShare

Divyat Mahajan

@divyat09

25 days ago

[1/9] While pretraining data might be hitting a wall, novel methods for modeling it are just getting started! We introduce future summary prediction (FSP), where the model predicts future sequence embeddings to reduce teacher forcing & shortcut learning. 📌Predict a learned

thumb_up_off_alt216

chat_bubble_outline10

repeat46

shareShare