Alex Gajewski (@apagajewski) Twitter Tweets • TwiCopy

Alex Gajewski

@apagajewski

+ Follow

trying to figure out sample efficiency, prev founder @sfcompute, @ExaAILabs

ID: 2628093857

linkhttp://alexgajewski.org calendar_today21-06-2014 09:40:04

88 Tweet

2,2K Followers

865 Following

Raffi Hotter

@raffi_hotter

10 months ago

Why are MRIs so damn heavy? They weigh over 10,000 pounds, or about 3 cars' worth. MRI design knowledge is hidden inside large companies, so I tried to work it out from first principles. 🧵 (1/12)

thumb_up_off_alt124

chat_bubble_outline7

repeat11

shareShare

Alex Gajewski

@apagajewski

10 months ago

this is a surprisingly creative (very) short story to have been written by an LM!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

very interesting talk on the future of mathematics given language models: youtu.be/vYCT7cw0ycw?si… one thing this made me realize is that the next decade or so of math is going to be incredibly fun, as the reasoning models crank through all sorts of new and interesting questions

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Alex Gajewski

@apagajewski

10 months ago

I wonder how much compute it would take to reproduce the entire published set of deep learning knowledge. if you were efficient about it.

thumb_up_off_alt2

chat_bubble_outline2

repeat0

shareShare

Alex Gajewski

@apagajewski

10 months ago

I wonder if companies training creative models will be more durable than companies training models to perform well-defined tasks. Because it's easier for competitors to hill-climb on the evals for the tasks, but training a creative model requires the founders to have good taste,

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Alex Gajewski

@apagajewski

10 months ago

Has anyone tried “sub-token attention”? Artificially increase the sequence length by including K copies of each token next to each other (say, each linearly projected by a different map), and let the different copies attend to each other. True self-attention :P (And then at the

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Alex Gajewski

@apagajewski

10 months ago

One part of SF Compute we haven’t talked about very much yet is that post-AGI (presumably soon), the models will want to train more models. (Really, people will ask the first models to train more models, or perhaps to solve tasks that would benefit from, say, some custom RL).

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

Alex Gajewski

@apagajewski

10 months ago

very excited the weights of o1 finally arrived

thumb_up_off_alt13

chat_bubble_outline2

repeat0

shareShare

Alex Gajewski

@apagajewski

10 months ago

Very excited for this new cluster. Big enough to train R1, but it's running our combinatorial auction so the prices should be rational

thumb_up_off_alt17

chat_bubble_outline0

repeat0

shareShare

Alex Gajewski

@apagajewski

10 months ago

Feels like a good time to start a computer control startup. The methods are generally known (RL on top of base models), and it probably doesn't require that much compute, just thoughtful environment design. I would probably start with a text-only representation of websites.

thumb_up_off_alt7

chat_bubble_outline1

repeat0

shareShare

Alex Gajewski

@apagajewski

10 months ago

I hope that somebody starts a company to make an AI-native smartwatch. It feels to me like the ideal form factor for most of what I want a language model to do.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Alex Gajewski

@apagajewski

10 months ago

This one seems like a good idea to me, increasingly I think datasets and RL environments are the limiting factor:

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Alex Gajewski

@apagajewski

10 months ago

The new google image model is quite good except for the fact that it doesn't like to draw physicists:

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Alex Gajewski

@apagajewski

10 months ago

I wonder what you would get if you trained something Cycle-GAN-like between images and music. Probably possible today with the quality of generative models we have!

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare