Milad Aghajohari
@maghajohari
Milad Aghajohari.
RL for LLM Reasoning.
Multi-Agent RL.
ID: 1275310881199489024
http://miladink.github.io 23-06-2020 06:14:14
204 Tweet
377 Takipçi
302 Takip Edilen
A project that at first seemed counterintuitive/weird but i started to appreciate the more i heard about it from Amirhossein Kazemnejad and the other authors: You can reason much more efficiently if you discard/forget older reasoning steps and just attend the recent thoughts Made me
We RL-train a CoT model to cope with restricted context (a textual state) and obtain scalable long CoTs (no quadratic cost) + a puzzling TTS behavior where the model actually uses more tokens for harder problems. Kudos to Amirhossein Kazemnejad Milad Aghajohari Kamran Chitsaz who see depth behind