Manu Gaur (@gaur_manu) 's Twitter Profile
Manu Gaur

@gaur_manu

Used to do physics, now multiplying matrices @IIIT_Hyderabad Prev @amazon, @UTSResearch

ID: 588134988

linkhttps://manugaurdl.github.io/ calendar_today23-05-2012 06:14:57

2,2K Tweet

213 Takipçi

913 Takip Edilen

Wenhao Chai (@wenhaocha1) 's Twitter Profile Photo

An impressive paper that explores the balance between generalization and memorization in diffusion models. The authors argue that diffusion models do not globally underfit their training objectives but instead exhibit selective underfitting: they fit the empirical score function

An impressive paper that explores the balance between generalization and memorization in diffusion models. The authors argue that diffusion models do not globally underfit their training objectives but instead exhibit selective underfitting: they fit the empirical score function
Manu Gaur (@gaur_manu) 's Twitter Profile Photo

been thinking about recurrence for stupidly long video understanding. will need the right incentives and arch tweaks, but if done correctly would be beautiful

John Nguyen (@__johnnguyen__) 's Twitter Profile Photo

Why add REPA when you can be explicit and use the VLM representation to generate? 🤔 We found the semantic encoder already has the right priors. Train it to sample in its native latent space + lightweight pixel decoder = unified vision model. But naively using the semantic

Why add REPA when you can be explicit and use the VLM representation to generate? 🤔

We found the semantic encoder already has the right priors. Train it to sample in its native latent space + lightweight pixel decoder = unified vision model.

But naively using the semantic
Chinmay Kak (@chinmaykak) 's Twitter Profile Photo

Introducing nanosft, a clean single file implementation of finetuning for chat style model. Loads gpt2-124M weights on nanogpt and does supervised finetuning using just pytorch. a side project that I made recently for some prep. link below :) qts/rts appericiated

Introducing nanosft, a clean single file implementation of finetuning for chat style model. Loads gpt2-124M weights on nanogpt and does supervised finetuning using just pytorch. 
a side project that I made recently for some prep. link below :) 
qts/rts appericiated
Manu Gaur (@gaur_manu) 's Twitter Profile Photo

motivation is a scam, discipline is the way. don't be a slave to your moods. don't be a slave to your environment. be disciplined, be free.

Willis (Nanye) Ma (@ma_nanye) 's Twitter Profile Photo

Excited to introduce DiffuseNNX, a comprehensive JAX/Flax NNX-based library for diffusion and flow matching! It supports multiple diffusion / flow-matching frameworks, Autoencoders, DiT variants, and sampling algorithms. Repo: github.com/willisma/diffu… Delve into details below!

Boyang Zheng (@boyangzheng_) 's Twitter Profile Photo

Introducing Representation Autoencoders (RAE)! We revisit the latent space of Diffusion Transformers, replacing VAE with RAE: pretrained representation encoders (DINOv2, SigLIP2) paired with trained ViT decoders. (1/n)

Manu Gaur (@gaur_manu) 's Twitter Profile Photo

Rohan Choudhury I think the point is that we can use encoders like dino, siglip and that semantic representations arent as lossy as we had thought initially - cnn, vit doesn’t really matter

Nupur Kumari (@nupurkmr9) 's Twitter Profile Photo

🚀 New preprint! We present NP-Edit, a framework for training an image editing diffusion model without paired supervision. We use differentiable feedback from Vision-Language Models (VLMs) combined with distribution-matching loss (DMD) to learn editing directly. webpage:

Nikhil Keetha (@nik__v__) 's Twitter Profile Photo

Chris Offner Alexandre Morgand That's a curve ball question but here's my intuition/hypothesis: 1. Locality of Task: Depth estimation driven by priors can be primarily thought of as a local task, i.e., given semantic context of things in the image you can predict the relative depth - hence why also linear