Lucas Beyer (bl16) (@giffmana) Twitter Tweets • TwiCopy

Lucas Beyer (bl16)

@giffmana

+ Follow

Researcher (now: OpenAI, ex: DeepMind, Brain, RWTH Aachen), Gamer, Hacker, Belgian. Anon feedback: admonymous.co/giffmana
✗DMs → email

ID: 2236047510

linkhttp://lucasb.eyer.be calendar_today08-12-2013 13:31:09

18,18K Tweet

88,88K Takipçi

593 Takip Edilen

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

will brown

@willccbb

7 days ago

grinding til 3am isn’t really a flex if you’re waking up at noon every day

thumb_up_off_alt13,13K

chat_bubble_outline185

repeat231

shareShare

nsys looks pretty cool actually, but information overload for a first-time user. Took me a bit to get good at Google's XProf too, so let's get started! QQ to my nsys expert followers: any specific pro-tips? Biggest bang-for-buck things/views to look at? Any good pytorch

thumb_up_off_alt121

chat_bubble_outline18

repeat7

shareShare

Lucas Beyer (bl16)

@giffmana

6 days ago

Why on earth is the first thing I want to do always too bleeding edge. Every single time. Damnit, FML!

thumb_up_off_alt82

chat_bubble_outline9

repeat0

shareShare

Lucas Beyer (bl16)

@giffmana

5 days ago

Heh nice

thumb_up_off_alt55

chat_bubble_outline2

repeat0

shareShare

Lucas Beyer (bl16)

@giffmana

4 days ago

They have a cloud infrastructure?

thumb_up_off_alt218

chat_bubble_outline37

repeat3

shareShare

Lucas Beyer (bl16)

@giffmana

3 days ago

guys I'm under observation now👀

thumb_up_off_alt2,2K

chat_bubble_outline70

repeat15

shareShare

Lucas Beyer (bl16)

@giffmana

3 days ago

torch.profiler.profiler.py Last commit: 1mo ago "add memory blablabla" torch.autograd.profiler.py Last commit: 2mo ago "Induce inductor blablabla" One of the two is legacy/deprecated, but you only learn that by looking at the docs of the other one, so if you land on the old one by

thumb_up_off_alt35

chat_bubble_outline3

repeat1

shareShare

Lucas Beyer (bl16)

@giffmana

2 days ago

I like the Encoder-only Mask Transformer (EoMT): basically removing all the bells and whistles, and doing panoptic segmentation with an almost vanilla ViT. You're sliiiiightly worse for the same encoder size, but it's a lot simpler/faster and (likely) more scalable. I wish they

thumb_up_off_alt524

chat_bubble_outline14

repeat59

shareShare

Lucas Beyer (bl16)

@giffmana

2 days ago

Oh wow, did you guys know that torch.compile can compile numpy code? And even run it on GPU? This is pretty neat for all kinds of "surrounding" code besides the model (like evals and fancy metrics) that I used to do with numba/numexpr (cuz CPU-XLA was pretty meh). Poll below

thumb_up_off_alt949

chat_bubble_outline43

repeat66

shareShare

Lucas Beyer (bl16)

@giffmana

2 days ago

Did you know this?

thumb_up_off_alt26

chat_bubble_outline3

repeat0

shareShare

Lucas Beyer (bl16)

@giffmana

2 days ago

The answer is that the name is weird, it's simply (almost) the whole flex_attention bwd compute, not just "zeros" as I thought the name implies. The way to find out would be looking at the call-stack, opening that generated file with the long name, and then go look at the actual

thumb_up_off_alt54

chat_bubble_outline3

repeat0

shareShare

Lucas Beyer (bl16)

@giffmana

a day ago

This might well be the start of a new big lab drama:

thumb_up_off_alt78

chat_bubble_outline5

repeat2

shareShare

Lucas Beyer (bl16)

@giffmana

a day ago

Cool work uses "visual anagrams": two images of different objects made out of the same image patches. Model must classify both correctly to score. Hence, higher scoring models use global geometry, lower use textures. SigLIP is GOAT of course, or I wouldn't repost this (jk)

thumb_up_off_alt142

chat_bubble_outline2

repeat7

shareShare

Lucas Beyer (bl16)

@giffmana

a day ago

Phew, reassuring that most of you didn't know either:

thumb_up_off_alt50

chat_bubble_outline4

repeat1

shareShare

Lucas Beyer (bl16)

@giffmana

20 hours ago

Interesting alternative to multi-token prediction, though the figure is a bit unintuitive. Instead of attaching a head for each +d'th prediction, pass a dummy input token for each extra prediction through the model. This is A LOT more expensive, e.g. doing 2-step prediction

thumb_up_off_alt311

chat_bubble_outline12

repeat20

shareShare

Lucas Beyer (bl16)

@giffmana

11 hours ago

Or, in other words, Gemini2.5 Pro succeeds at 30% of real world office tasks. That's pretty good, considering this is the worst Gemini Pro will ever be.

thumb_up_off_alt445

chat_bubble_outline34

repeat24

shareShare