Mahan Fathi (@mahanfathi) 's Twitter Profile
Mahan Fathi

@mahanfathi

research @nvidia👁️; ex @googledeepmind, @google🧠 & @mila_quebec.

ID: 321884742

linkhttp://mahanfathi.github.io/ calendar_today22-06-2011 08:52:28

65 Tweet

666 Followers

130 Following

Ross Goroshin (@rgoroshin) 's Twitter Profile Photo

Last week, I gave a talk at Mila - Institut québécois d'IA. The talk should be of interest to anyone working on predictive models, particularly in latent space. In collab. with Mahan Fathi Clement Gehring Jonathan Pilault David Kanaa Pierre-Luc Bacon. See you at ICLR 2026 in 🇦🇹! drive.google.com/file/d/1mQSXFa…

Mahan Fathi (@mahanfathi) 's Twitter Profile Photo

life update: thrilled to announce that i’ll be joining NVIDIA as a research scientist on the alignment team. grateful for the support from mentors and peers. this is a dream come true for both the researcher and the gamer in me!

Eric Elmoznino (@ericelmoznino) 's Twitter Profile Photo

Introducing our new paper explaining in-context learning through the lens of Occam’s razor, giving a normative account of next-token prediction objectives. This was with Tom Marty Tejas kasetty Léo Gagnon Sarthak Mittal Mahan Fathi Dhanya Sridhar Guillaume Lajoie arxiv.org/abs/2410.14086

Guillaume Lajoie (@g_lajoie_) 's Twitter Profile Photo

In-context learnin (ICL) is one of the most exciting part of the LLM boom. Sequence models (not just LLMs) implement on-the-fly models conditionned on inputs w/o weight updates! Q: are in-context models better than «in-weights» ones? A: some times ICL is better than standard opt.

Ross Goroshin (@rgoroshin) 's Twitter Profile Photo

The talk I gave @ Mila on learning linearized representations of dynamical systems (Koopman representations) is on YouTube. The work was mainly carried out by Mahan Fathi in collaboration with Pierre-Luc Bacon 's lab, and was presented at ICLR 2024. youtube.com/watch?v=wKyN5j…

Oleksii Kuchaiev (@kuchaev) 's Twitter Profile Photo

NeMo RL is now open source! It replaces NeMo-Aligner and is the toolkit we use to post train next generations of our models. Give it a try github.com/NVIDIA/NeMo-RL

Shashwat Goel (@shashwatgoel7) 's Twitter Profile Photo

Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below🧵👇

Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below🧵👇
Mahan Fathi (@mahanfathi) 's Twitter Profile Photo

We're looking for Summer Interns to join the Post-Training Team at @NVIDIA! DM me with your updated resume and three concise bullets detailing your most relevant experience — e.g. publications, repos, blogs, etc. RT please to help us find top talent.

Oleksii Kuchaiev (@kuchaev) 's Twitter Profile Photo

🚀 Nemotron 3 Nano 30B-A3B is here! Open weights + open data + open source. AA Intelligence Index: 52 (Artificial Analysis ) ✅ 1M‑token context ✅ up to 3.3× higher throughput vs similarly sized open models ✅ stronger reasoning/agentic + chat Details + links in the thread 🧵

🚀 Nemotron 3 Nano 30B-A3B is here! Open weights + open data + open source.

AA Intelligence Index: 52 (<a href="/ArtificialAnlys/">Artificial Analysis</a> )
✅ 1M‑token context
✅ up to 3.3× higher throughput vs similarly sized open models
✅ stronger reasoning/agentic + chat

Details + links in the thread 🧵
Oleksii Kuchaiev (@kuchaev) 's Twitter Profile Photo

Nemotron 3 Super is here — 120B total / 12B active, Hybrid SSM Latent MoE, designed for Blackwell. Truly open: permissive license, open data, open training infra. See analysis on Artificial Analysis Details in thread 🧵below:

Nemotron 3 Super is here — 120B total / 12B active, Hybrid SSM Latent MoE, designed for Blackwell.
Truly open: permissive license, open data, open training infra. See analysis on <a href="/ArtificialAnlys/">Artificial Analysis</a> 

Details in thread 🧵below:
Artificial Analysis (@artificialanlys) 's Twitter Profile Photo

NVIDIA has released Nemotron 3 Super, a 120B (12B active) open weights reasoning model that scores 36 on the Artificial Analysis Intelligence Index with a hybrid Mamba-Transformer MoE architecture We were given access to this model ahead of launch and evaluated it across

NVIDIA has released Nemotron 3 Super, a 120B (12B active) open weights reasoning model that scores 36 on the Artificial Analysis Intelligence Index with a hybrid Mamba-Transformer MoE architecture

We were given access to this model ahead of launch and evaluated it across