Manu Gaur (@gaur_manu) Twitter Tweets • TwiCopy

good girl

@goodgirlxsz

5 hours ago

🔥Telegram İfşa

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

An impressive paper that explores the balance between generalization and memorization in diffusion models. The authors argue that diffusion models do not globally underfit their training objectives but instead exhibit selective underfitting: they fit the empirical score function

thumb_up_off_alt275

chat_bubble_outline2

repeat49

shareShare

Manu Gaur

@gaur_manu

2 months ago

really cool work!

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

Manu Gaur

@gaur_manu

2 months ago

Work with Yuki if you get the opportunity, he's a great researcher and a wonderful person :)

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Manu Gaur

@gaur_manu

2 months ago

reward hacking 101

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Manu Gaur

@gaur_manu

2 months ago

been thinking about recurrence for stupidly long video understanding. will need the right incentives and arch tweaks, but if done correctly would be beautiful

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

John Nguyen

@__johnnguyen__

2 months ago

Why add REPA when you can be explicit and use the VLM representation to generate? 🤔 We found the semantic encoder already has the right priors. Train it to sample in its native latent space + lightweight pixel decoder = unified vision model. But naively using the semantic

thumb_up_off_alt104

chat_bubble_outline1

repeat24

shareShare

Chinmay Kak

@chinmaykak

a month ago

Introducing nanosft, a clean single file implementation of finetuning for chat style model. Loads gpt2-124M weights on nanogpt and does supervised finetuning using just pytorch. a side project that I made recently for some prep. link below :) qts/rts appericiated

thumb_up_off_alt219

chat_bubble_outline4

repeat29

shareShare

Manu Gaur

@gaur_manu

a month ago

really good thread on how Sinkhorn-Knopp ended up powering current SSL methods

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Manu Gaur

@gaur_manu

a month ago

motivation is a scam, discipline is the way. don't be a slave to your moods. don't be a slave to your environment. be disciplined, be free.

thumb_up_off_alt6

chat_bubble_outline1

repeat0

shareShare

Willis (Nanye) Ma

@ma_nanye

a month ago

Excited to introduce DiffuseNNX, a comprehensive JAX/Flax NNX-based library for diffusion and flow matching! It supports multiple diffusion / flow-matching frameworks, Autoencoders, DiT variants, and sampling algorithms. Repo: github.com/willisma/diffu… Delve into details below!

thumb_up_off_alt219

chat_bubble_outline4

repeat52

shareShare

Boyang Zheng

@boyangzheng_

a month ago

Introducing Representation Autoencoders (RAE)! We revisit the latent space of Diffusion Transformers, replacing VAE with RAE: pretrained representation encoders (DINOv2, SigLIP2) paired with trained ViT decoders. (1/n)

thumb_up_off_alt482

chat_bubble_outline6

repeat55

shareShare

Manu Gaur

@gaur_manu

a month ago

!!!

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Manu Gaur

@gaur_manu

a month ago

Rohan Choudhury I think the point is that we can use encoders like dino, siglip and that semantic representations arent as lossy as we had thought initially - cnn, vit doesn’t really matter

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Manu Gaur

@gaur_manu

a month ago

"humanity's last last exam. i swear, this is the last one, trust me bro"

thumb_up_off_alt5

chat_bubble_outline1

repeat0

shareShare

Nupur Kumari

@nupurkmr9

a month ago

🚀 New preprint! We present NP-Edit, a framework for training an image editing diffusion model without paired supervision. We use differentiable feedback from Vision-Language Models (VLMs) combined with distribution-matching loss (DMD) to learn editing directly. webpage:

thumb_up_off_alt171

chat_bubble_outline2

repeat29

shareShare

Shane Gu

@shaneguml

a month ago

It's the "model" in model-based RL

thumb_up_off_alt96

chat_bubble_outline4

repeat7

shareShare

Nikhil Keetha

@nik__v__

a month ago

Chris Offner Alexandre Morgand That's a curve ball question but here's my intuition/hypothesis: 1. Locality of Task: Depth estimation driven by priors can be primarily thought of as a local task, i.e., given semantic context of things in the image you can predict the relative depth - hence why also linear

thumb_up_off_alt23

chat_bubble_outline2

repeat3

shareShare

Manu Gaur

@gaur_manu

a month ago

someone took 3 classes of intro to prob. cool stuff

thumb_up_off_alt12

chat_bubble_outline0

repeat0

shareShare

Manu Gaur

@gaur_manu

a month ago

really cool work on entropy-based tokenization!!

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

Manu Gaur

@gaur_manu

a month ago

there are cathedrals everywhere for those with the eyes to see

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Manu Gaur

good girl

Wenhao Chai

Manu Gaur

Manu Gaur

Manu Gaur

Manu Gaur

John Nguyen

Chinmay Kak

Manu Gaur

Manu Gaur

Willis (Nanye) Ma

Boyang Zheng

Manu Gaur

Manu Gaur

Manu Gaur

Nupur Kumari

Shane Gu

Nikhil Keetha

Manu Gaur

Manu Gaur

Manu Gaur