Milad Aghajohari (@maghajohari) Twitter Tweets • TwiCopy

Milad Aghajohari

@maghajohari

+ Follow

Milad Aghajohari.
RL for LLM Reasoning.
Multi-Agent RL.

ID: 1275310881199489024

linkhttp://miladink.github.io calendar_today23-06-2020 06:14:14

204 Tweet

377 Takipçi

302 Takip Edilen

Siva Reddy

@sivareddyg

3 months ago

Linear time thinking, not quadratic 🚀🚀. A recipe to scale RL reasoning linearly -- both inference time and training. Works with existing models out of the box or one can adapt them efficiently to be native linear time thinkers.

thumb_up_off_alt77

chat_bubble_outline1

repeat18

shareShare

Benno Krojer

@benno_krojer

3 months ago

A project that at first seemed counterintuitive/weird but i started to appreciate the more i heard about it from Amirhossein Kazemnejad and the other authors: You can reason much more efficiently if you discard/forget older reasoning steps and just attend the recent thoughts Made me

thumb_up_off_alt23

chat_bubble_outline1

repeat3

shareShare

Milad Aghajohari

@maghajohari

3 months ago

Wanna train your reasoning LLM to think for 10M tokens? Almost impossible under the current paradigm. Checkout Markovian Thinker, a simple way to do it now👇

thumb_up_off_alt34

chat_bubble_outline0

repeat6

shareShare

Siva Reddy

@sivareddyg

3 months ago

Yes, with Markovian Thinker reasoning models can think/train without a limit! 1M, 10M tokens. This is not the case with current LongCoT thinking models --- you can only think up to the max limit you are trained for. Happy to take any questions or feedback.

thumb_up_off_alt17

chat_bubble_outline0

repeat1

shareShare

Artem Zholus

@artemzholus

3 months ago

Nice paper! Make the context for reasoning local and train an RL model with such truncation. This way the model "markovinifies" and makes use of its context efficiently!

thumb_up_off_alt12

chat_bubble_outline0

repeat3

shareShare

Alessandro Sordoni

@murefil

3 months ago

We RL-train a CoT model to cope with restricted context (a textual state) and obtain scalable long CoTs (no quadratic cost) + a puzzling TTS behavior where the model actually uses more tokens for harder problems. Kudos to Amirhossein Kazemnejad Milad Aghajohari Kamran Chitsaz who see depth behind

thumb_up_off_alt22

chat_bubble_outline0

repeat8

shareShare

Reza Bayat

@reza_byt

3 months ago

Very late to announce this, but since everyone is losing their minds over recursive models, I’m excited to share that our Mixture-of-Recursions (MoR) paper has been accepted to #NeurIPS2025! 🚀 3× smaller, 2× higher throughput, yet better accuracy! Code below 👇

thumb_up_off_alt70

chat_bubble_outline4

repeat6

shareShare

Amirhossein Kazemnejad

@a_kazemnejad

2 months ago

After nearly 3 years since our NeurIPS paper, SOTA architectures are now adopting NoPE. Kimi Linear uses NoPE for all full-attention layers (not a RoPE hybrid).

thumb_up_off_alt371

chat_bubble_outline7

repeat34

shareShare

Joey Bose

@bose_joey

2 months ago

Come do a PhD with me 😀! Promise of fun science and great coffee ☕

thumb_up_off_alt734

chat_bubble_outline31

repeat70

shareShare

Milad Aghajohari

@maghajohari

2 months ago

High-quality pre-training data for real GUIs! They show training on this data results in an LLM that is executes textual commands of a planner model on the GUI (their model does the clicks!). Makes you question: Is it better to mix this data in the pre-training of the planner

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare