miru (@miru_why) Twitter Tweets • TwiCopy

miru

@miru_why

+ Follow

3e-4x engineer, unswizzled wagmi. specialization is for warps

ID: 1744759297211543552

calendar_today09-01-2024 16:34:15

507 Tweet

1,1K Followers

1,1K Following

miru

@miru_why

a year ago

One Diffusion to Generate Them All arxiv.org/pdf/2411.16318 by framing a long list of tasks (T2I, multiview, depth/semantics/pose estimation…) as ‘frame-sequence completion’, authors train one from-scratch generalist diffusion model to perform the work of many specialist models

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

miru

@miru_why

a year ago

good 3b1b-style vid on how to write a fast softmax kernel with block reduction, access coalescing, warp reduction, and online normalizer calculation youtube.com/watch?v=IpHjDo…

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

miru

@miru_why

a year ago

wait is ‘remove mean subtraction from the norm layers’ really all you need? this looks way nicer than moviegen’s weird outlier penalty loss

thumb_up_off_alt7

chat_bubble_outline2

repeat0

shareShare

Sander Dieleman

@sedielem

a year ago

📢PSA: #NeurIPS2024 recordings are now publicly available! The workshops always have tons of interesting things on at once, so the FOMO is real😵‍💫 Luckily it's all recorded, so I've been catching up on what I missed. Thread below with some personal highlights🧵

thumb_up_off_alt229

chat_bubble_outline2

repeat41

shareShare

miru

@miru_why

a year ago

the whale has spoken

thumb_up_off_alt1,1K

chat_bubble_outline16

repeat65

shareShare

miru

@miru_why

8 months ago

PixelFlow: Pixel-Space Generative Models with Flow github.com/ShoufaChen/Pix… arxiv.org/abs/2504.07963 the authors train a pixel space image generator with gradually-increasing spatial resolution across timesteps, and release 1B-scale class- and text-conditional checkpoints

thumb_up_off_alt80

chat_bubble_outline0

repeat22

shareShare

miru

@miru_why

8 months ago

if you were curious about the torch.sum bug discussed in the gpt-4.5 pretraining podcast (youtu.be/6nJZopACRuQ?si…), here’s the original thread from last june

thumb_up_off_alt20

chat_bubble_outline0

repeat1

shareShare

miru

@miru_why

8 months ago

interesting paper on ‘any-subset’ auto-regressive modeling without the standard product-rule factorization arxiv.org/abs/2504.20456… github.com/gabeguo/any-or… their model can sample from the true joint distribution with 10% fewer NFEs (i.e. speculative decoding with no extra model)

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare