Mahan Fathi (@mahanfathi) Twitter Tweets • TwiCopy

Pierre-Luc Bacon

@pierrelux

2 years ago

Congrats to Mahan, who is finishing his Master's thesis in beauty with this second paper.

thumb_up_off_alt26

chat_bubble_outline0

repeat3

shareShare

Last week, I gave a talk at Mila - Institut québécois d'IA. The talk should be of interest to anyone working on predictive models, particularly in latent space. In collab. with Mahan Fathi Clement Gehring Jonathan Pilault David Kanaa Pierre-Luc Bacon. See you at ICLR 2026 in 🇦🇹! drive.google.com/file/d/1mQSXFa…

thumb_up_off_alt18

chat_bubble_outline0

repeat5

shareShare

Mahan Fathi

@mahanfathi

2 years ago

life update: thrilled to announce that i’ll be joining NVIDIA as a research scientist on the alignment team. grateful for the support from mentors and peers. this is a dream come true for both the researcher and the gamer in me!

thumb_up_off_alt405

chat_bubble_outline33

repeat4

shareShare

Eric Elmoznino

@ericelmoznino

2 years ago

Introducing our new paper explaining in-context learning through the lens of Occam’s razor, giving a normative account of next-token prediction objectives. This was with Tom Marty Tejas kasetty Léo Gagnon Sarthak Mittal Mahan Fathi Dhanya Sridhar Guillaume Lajoie arxiv.org/abs/2410.14086

thumb_up_off_alt103

chat_bubble_outline3

repeat23

shareShare

Guillaume Lajoie

@g_lajoie_

2 years ago

In-context learnin (ICL) is one of the most exciting part of the LLM boom. Sequence models (not just LLMs) implement on-the-fly models conditionned on inputs w/o weight updates! Q: are in-context models better than «in-weights» ones? A: some times ICL is better than standard opt.

thumb_up_off_alt22

chat_bubble_outline0

repeat5

shareShare

Ross Goroshin

@rgoroshin

2 years ago

The talk I gave @ Mila on learning linearized representations of dynamical systems (Koopman representations) is on YouTube. The work was mainly carried out by Mahan Fathi in collaboration with Pierre-Luc Bacon 's lab, and was presented at ICLR 2024. youtube.com/watch?v=wKyN5j…

thumb_up_off_alt20

chat_bubble_outline0

repeat3

shareShare

Oleksii Kuchaiev

@kuchaev

a year ago

Llama-Nemotron-v1 technical report is now available on arxiv arxiv.org/pdf/2505.00949…

thumb_up_off_alt348

chat_bubble_outline3

repeat64

shareShare

Oleksii Kuchaiev

@kuchaev

a year ago

NeMo RL is now open source! It replaces NeMo-Aligner and is the toolkit we use to post train next generations of our models. Give it a try github.com/NVIDIA/NeMo-RL

thumb_up_off_alt393

chat_bubble_outline4

repeat65

shareShare

Shashwat Goel

@shashwatgoel7

a year ago

Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below🧵👇

thumb_up_off_alt836

chat_bubble_outline33

repeat120

shareShare

Mahan Fathi

@mahanfathi

6 months ago

We're looking for Summer Interns to join the Post-Training Team at @NVIDIA! DM me with your updated resume and three concise bullets detailing your most relevant experience — e.g. publications, repos, blogs, etc. RT please to help us find top talent.

thumb_up_off_alt460

chat_bubble_outline13

repeat35

shareShare

Oleksii Kuchaiev

@kuchaev

5 months ago

🚀 Nemotron 3 Nano 30B-A3B is here! Open weights + open data + open source. AA Intelligence Index: 52 (Artificial Analysis ) ✅ 1M‑token context ✅ up to 3.3× higher throughput vs similarly sized open models ✅ stronger reasoning/agentic + chat Details + links in the thread 🧵

🚀 Nemotron 3 Nano 30B-A3B is here! Open weights + open data + open source.

AA Intelligence Index: 52 (<a href="/ArtificialAnlys/">Artificial Analysis</a> )
✅ 1M‑token context
✅ up to 3.3× higher throughput vs similarly sized open models
✅ stronger reasoning/agentic + chat

Details + links in the thread 🧵

thumb_up_off_alt134

chat_bubble_outline3

repeat33

shareShare

Oleksii Kuchaiev

@kuchaev

2 months ago

Nemotron 3 Super is here — 120B total / 12B active, Hybrid SSM Latent MoE, designed for Blackwell. Truly open: permissive license, open data, open training infra. See analysis on Artificial Analysis Details in thread 🧵below:

thumb_up_off_alt274

chat_bubble_outline10

repeat45

shareShare

Artificial Analysis

@artificialanlys

2 months ago

NVIDIA has released Nemotron 3 Super, a 120B (12B active) open weights reasoning model that scores 36 on the Artificial Analysis Intelligence Index with a hybrid Mamba-Transformer MoE architecture We were given access to this model ahead of launch and evaluated it across

thumb_up_off_alt427

chat_bubble_outline14

repeat57

shareShare

NVIDIA

@nvidia

2 months ago

x.com/i/article/2031…

thumb_up_off_alt1,1K

chat_bubble_outline69

repeat269

shareShare