Mrinal Mathur (@bobthemaster) 's Twitter Profile
Mrinal Mathur

@bobthemaster

Research Engineer @Google |@BytedanceTalk | @Amazon | @Apple | @CenterTrends | @ARM

ID: 114763112

calendar_today16-02-2010 14:45:27

8,8K Tweet

363 Takipçi

639 Takip Edilen

Manling Li (@manlingli_) 's Twitter Profile Photo

VLAs, VLMs, LLMs, and Vision Foundation Models for Embodied Agents! There are just so many new updates in recent months! We have updated our tutorial, come and join us if you would like to discuss the latest advances! Room: 306B Time: 1pm-5pm Slides: …models-meet-embodied-agents.github.io

VLAs, VLMs, LLMs, and Vision Foundation Models for Embodied Agents!

There are just so many new updates in recent months!

We have updated our tutorial, come and join us if you would like to discuss the latest advances!

Room: 306B
Time: 1pm-5pm
Slides: …models-meet-embodied-agents.github.io
Jessy Lin (@realjessylin) 's Twitter Profile Photo

🧠 How can we equip LLMs with memory that allows them to continually learn new things? In our new paper with AI at Meta, we show how sparsely finetuning memory layers enables targeted updates for continual learning, w/ minimal interference with existing knowledge. While full

🧠 How can we equip LLMs with memory that allows them to continually learn new things?

In our new paper with <a href="/AIatMeta/">AI at Meta</a>, we show how sparsely finetuning memory layers enables targeted updates for continual learning, w/ minimal interference with existing knowledge.

While full
Rishabh Agarwal (@agarwl_) 's Twitter Profile Photo

Yuandong is well-respected within Meta, detail oriented, and technically sharp -- this layoff doesn't make sense and my hunch is that it might be targeted towards ex-GenAI people. Meta's loss, but could be your win if you hiring frontier RL researchers ;)

Thinking Machines (@thinkymachines) 's Twitter Profile Photo

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
机器之心 JIQIZHIXIN (@synced_global) 's Twitter Profile Photo

Huge breakthrough from DeepMind! In their latest Nature paper, “Discovering state-of-the-art reinforcement learning algorithms,” they show that AI can autonomously discover better RL algorithms. "Enabling machines to discover learning algorithms for themselves is one of the

Huge breakthrough from DeepMind!

In their latest Nature paper, “Discovering state-of-the-art reinforcement learning algorithms,” they show that AI can autonomously discover better RL algorithms.

"Enabling machines to discover learning algorithms for themselves is one of the
andrew gao (@itsandrewgao) 's Twitter Profile Photo

colab notebook for on-policy distillation 👇🔗 (for those without Thinking Machines tinker access) train qwen-0.6b with OPD to get from 38% -> 60% on GSM8K works for models without the same tokenizer!

colab notebook for on-policy distillation 👇🔗 (for those without <a href="/thinkymachines/">Thinking Machines</a> tinker access)

train qwen-0.6b with OPD to get from 38% -&gt; 60% on GSM8K

works for models without the same tokenizer!
elie (@eliebakouch) 's Twitter Profile Photo

Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably huggingface.co/spaces/Hugging…

Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably

huggingface.co/spaces/Hugging…
Rishabh Agarwal (@agarwl_) 's Twitter Profile Photo

I was puzzled by why their paper claims "bfloat16" training crashes -- since we trained for 100,000 GPU hours and 7K+ training steps for both dense and MoEs in the ScaleRL paper stably without any crashes. I think it matters what kind of GPUs they used -- they mention in the

I was puzzled by why their paper claims "bfloat16" training crashes -- since we trained for 100,000 GPU hours and 7K+ training steps for both dense and MoEs in the ScaleRL paper stably without any crashes. 

I think it matters what kind of GPUs they used -- they mention in the
Sarthak Mittal (@sarthmit) 's Twitter Profile Photo

Tiny Recursion Models 🔁 meet Amortized Learners 🧠 After Alexia Jolicoeur-Martineau’s great talk, realized our framework mirrors it: recursion (Nₛᵤₚ=steps, n,T=1), detach grads but new obs each step → amortizing over long context Works across generative models, neural processes, & beyond

yingzhen (@liyzhen2) 's Twitter Profile Photo

The VCL paper has arguably the first example of modern continual learning for GenAI: VAEs trained on digit/alphabet images 1-by-1 arxiv.org/abs/1710.10628 Coded by yours truly ☺️ who was (and still is) 🥰 in generative models. Time to get back to continual learning again?

The VCL paper has arguably the first example of modern continual learning for GenAI: VAEs trained on digit/alphabet images 1-by-1 

arxiv.org/abs/1710.10628

Coded by yours truly ☺️ who was (and still is) 🥰 in generative models.

Time to get back to continual learning again?
Chenxiao Yang @ ICLR2025 (@chenxiao_yang_) 's Twitter Profile Photo

How powerful are Diffusion LLMs? Can they solve problems that Auto-Regressive (AR) LLMs can’t solve? Check our new paper "On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond" 🔗 arxiv.org/pdf/2510.06190 In this work, we show while Diffusion LLMs are indeed more

How powerful are Diffusion LLMs? Can they solve problems that Auto-Regressive (AR) LLMs can’t solve? 

Check our new paper "On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond"
🔗 arxiv.org/pdf/2510.06190

In this work, we show while Diffusion LLMs are indeed more
Trelis Research (@trelisresearch) 's Twitter Profile Photo

- Test-time Adaptation of Tiny Recursive Models - New Paper, and the Trelis Submission Approach for the 2025 ARC Prize Competition! In brief: - Alexia Jolicoeur-Martineau's excellent TRM approach does not quite fit in the compute constraints of the ARC Prize competition - BUT, if you take a

- Test-time Adaptation of Tiny Recursive Models - 

New Paper, and the Trelis Submission Approach for the 2025 <a href="/arcprize/">ARC Prize</a> Competition!

In brief:
- <a href="/jm_alexia/">Alexia Jolicoeur-Martineau</a>'s excellent TRM approach does not quite fit in the compute constraints of the ARC Prize competition
- BUT, if you take a
Neel Nanda (@neelnanda5) 's Twitter Profile Photo

It was great to help with this interactive tutorial on SAEs, what they can be used for, and how they work. Fantastic work by the team!

Mrinal Mathur (@bobthemaster) 's Twitter Profile Photo

I’m excited to share that I’ve officially crossed 100 citations on Google Scholar! 📚🔥 My goal has always been to contribute meaningful work in reasoning, multimodal understanding, and foundation model research — and this milestone is a small reminder that consistent effort

I’m excited to share that I’ve officially crossed 100 citations on Google Scholar! 📚🔥

My goal has always been to contribute meaningful work in reasoning, multimodal understanding, and foundation model research — and this milestone is a small reminder that consistent effort