Mihir Prabhudesai (@mihirp98) 's Twitter Profile
Mihir Prabhudesai

@mihirp98

CMU Robotics PhD | Research Intern @ Google

ID: 1037594703905136641

linkhttp://mihirp1998.github.io/ calendar_today06-09-2018 06:53:49

86 Tweet

677 Takipçi

372 Takip Edilen

CMU Robotics Institute (@cmu_robotics) 's Twitter Profile Photo

Check it out! 🚀 "Diffusion Beats Autoregressive in Data-Constrainted Settings" They show that Diffusion LLMs outperform Autoregressive LLMs, when allowed to train for multiple epochs! #CMUrobotics Work from Mihir Prabhudesai Mihir Prabhudesai & Mengning Wu Mengning Wu

Mihir Prabhudesai (@mihirp98) 's Twitter Profile Photo

Extrapolating this trend to robotics, i believe if one is doing sim2real they should prefer Autoregressive > Diffusion (compute bottleneck). But if they are doing real world training then Autoregressive < Diffusion (data bottleneck).. We don't empirically validate this for

Mihir Prabhudesai (@mihirp98) 's Twitter Profile Photo

We ran more experiments, with random token masking, and attention dropout in autoregressive training. Consistent with our earlier ablations, we find these augmentations still overfit quite quickly and are still quite behind diffusion models trained for 500+ epochs. Diffusion

We ran more experiments, with random token masking, and attention dropout in autoregressive training. Consistent with our earlier ablations, we find these augmentations still overfit quite quickly and are still quite behind diffusion models trained for 500+ epochs. Diffusion
Mihir Prabhudesai (@mihirp98) 's Twitter Profile Photo

We ran more experiments to better understand “why” diffusion models do better in data-constrained settings than autoregressive. Our findings support the hypothesis that diffusion models benefit from learning over multiple token orderings, which contributes to their robustness and

We ran more experiments to better understand “why” diffusion models do better in data-constrained settings than autoregressive. Our findings support the hypothesis that diffusion models benefit from learning over multiple token orderings, which contributes to their robustness and
Lucas Beyer (bl16) (@giffmana) 's Twitter Profile Photo

Amazing! Truly open review, through which we all gained more insights, i love it! Result: in multi epoch setting, making AR learn multiple orderings ~closes the gap to diffusion, explaining much of the difference. How the truly open review happened (from my vague memory): Mihir

Mihir Prabhudesai (@mihirp98) 's Twitter Profile Photo

In RENT, we showed LLMs can improve without access to answers - by maximizing confidence. In this work, we go further: LLMs can improve without even having the questions. Using self-play, one LLM learns to ask challenging questions, while other LLM uses confidence to solve them

Mihir Prabhudesai (@mihirp98) 's Twitter Profile Photo

Nice work -- great to see some of the core findings from our work being validated :) Our original paper post - x.com/mihirp98/statu… Just to clarify the points raised as issues about our work: 1 - Missing scalar term in the loss – This was a typo during paper writing,

Niklas Muennighoff (@muennighoff) 's Twitter Profile Photo

Excited to see recent works push the data-constrained frontier via diffusion LMs! Encoder-Decoders can also repeat a lot more as t5 showed in 2019 - back to Encoder-Decoders? =D

Excited to see recent works push the data-constrained frontier via diffusion LMs! Encoder-Decoders can also repeat a lot more as t5 showed in 2019 - back to Encoder-Decoders? =D
Mihir Prabhudesai (@mihirp98) 's Twitter Profile Photo

We do not use Eq. 1 - we use Eq. 2. The Eq. 1 reference was a minor typo (missing 1/r term) fixed within 7 days (July 26) of release. The authors saw this fix on arxiv but still cite our older versions for some reason. Using Eq. 1 makes diffusion outperform AR even at 1 epoch

We do not use Eq. 1 - we use Eq. 2. The Eq. 1 reference was a minor typo (missing 1/r term) fixed within 7 days (July 26) of release. 

The authors saw this fix on arxiv but still cite our older versions for some reason.

Using Eq. 1 makes diffusion outperform AR even at 1 epoch
Sachin Goyal (@goyalsachin007) 's Twitter Profile Photo

I myself got confused trying to understand what the serious flaw was, given the new paper had similar takeaways. There is a limit to clickbait stuff please.

Lambda (@lambdaapi) 's Twitter Profile Photo

AI that sees, hears, and reasons: superintelligence starts here. #LambdaResearch invites all researchers, engineers and AI enthusiasts to participate in the Grand Challenge on Multimodal Superintelligence. Join us and receive up to $20,000 compute credit per team to build the

AI that sees, hears, and reasons: superintelligence starts here.

#LambdaResearch invites all researchers, engineers and AI enthusiasts to participate in the Grand Challenge on Multimodal Superintelligence.

Join us and receive up to $20,000 compute credit per team to build the
Jason Liu (@jasonjzliu) 's Twitter Profile Photo

Ever wish a robot could just move to any goal in any environment—avoiding all collisions and reacting in real time? 🚀Excited to share our #CoRL2025 paper, Deep Reactive Policy (DRP), a learning-based motion planner that navigates complex scenes with moving obstacles—directly

Jiahui(Jim) Yang (@jiahui_yang6709) 's Twitter Profile Photo

After another wonderful year of neural motion planning research, we are excited to report a major upgrade on our pipeline 🎉 Introducing Deep Reactive Policy (DRP) 🚀 — our #CoRL2025 paper that extends our prior work Neural MP with both generalizability and reactivity while

Homanga Bharadhwaj (@mangahomanga) 's Twitter Profile Photo

I'll be joining the faculty Johns Hopkins University late next year as a tenure-track assistant professor in JHU Computer Science Looking for PhD students to join me tackling fun problems in robot manipulation, learning from human data, understanding+predicting physical interactions, and beyond!

I'll be joining the faculty <a href="/JohnsHopkins/">Johns Hopkins University</a> late next year as a tenure-track assistant professor in <a href="/JHUCompSci/">JHU Computer Science</a> 

Looking for PhD students to join me tackling fun problems in robot manipulation, learning from human data, understanding+predicting physical interactions, and beyond!
Rohan Choudhury (@rchoudhury997) 's Twitter Profile Photo

Excited to release our new preprint - we introduce Adaptive Patch Transformers (APT), a method to speed up vision transformers by using multiple different patch sizes within the same image!