Milad Aghajohari (@maghajohari) 's Twitter Profile
Milad Aghajohari

@maghajohari

Milad Aghajohari.
RL for LLM Reasoning.
Multi-Agent RL.

ID: 1275310881199489024

linkhttp://miladink.github.io calendar_today23-06-2020 06:14:14

204 Tweet

377 Takipçi

302 Takip Edilen

Siva Reddy (@sivareddyg) 's Twitter Profile Photo

Linear time thinking, not quadratic 🚀🚀. A recipe to scale RL reasoning linearly -- both inference time and training. Works with existing models out of the box or one can adapt them efficiently to be native linear time thinkers.

Benno Krojer (@benno_krojer) 's Twitter Profile Photo

A project that at first seemed counterintuitive/weird but i started to appreciate the more i heard about it from Amirhossein Kazemnejad and the other authors: You can reason much more efficiently if you discard/forget older reasoning steps and just attend the recent thoughts Made me

Milad Aghajohari (@maghajohari) 's Twitter Profile Photo

Wanna train your reasoning LLM to think for 10M tokens? Almost impossible under the current paradigm. Checkout Markovian Thinker, a simple way to do it now👇

Siva Reddy (@sivareddyg) 's Twitter Profile Photo

Yes, with Markovian Thinker reasoning models can think/train without a limit! 1M, 10M tokens. This is not the case with current LongCoT thinking models --- you can only think up to the max limit you are trained for. Happy to take any questions or feedback.

Artem Zholus (@artemzholus) 's Twitter Profile Photo

Nice paper! Make the context for reasoning local and train an RL model with such truncation. This way the model "markovinifies" and makes use of its context efficiently!

Alessandro Sordoni (@murefil) 's Twitter Profile Photo

We RL-train a CoT model to cope with restricted context (a textual state) and obtain scalable long CoTs (no quadratic cost) + a puzzling TTS behavior where the model actually uses more tokens for harder problems. Kudos to Amirhossein Kazemnejad Milad Aghajohari Kamran Chitsaz who see depth behind

Reza Bayat (@reza_byt) 's Twitter Profile Photo

Very late to announce this, but since everyone is losing their minds over recursive models, I’m excited to share that our Mixture-of-Recursions (MoR) paper has been accepted to #NeurIPS2025! 🚀 3× smaller, 2× higher throughput, yet better accuracy! Code below 👇

Amirhossein Kazemnejad (@a_kazemnejad) 's Twitter Profile Photo

After nearly 3 years since our NeurIPS paper, SOTA architectures are now adopting NoPE. Kimi Linear uses NoPE for all full-attention layers (not a RoPE hybrid).

Milad Aghajohari (@maghajohari) 's Twitter Profile Photo

High-quality pre-training data for real GUIs! They show training on this data results in an LLM that is executes textual commands of a planner model on the GUI (their model does the clicks!). Makes you question: Is it better to mix this data in the pre-training of the planner