Alex Gajewski (@apagajewski) 's Twitter Profile
Alex Gajewski

@apagajewski

trying to figure out sample efficiency, prev founder @sfcompute, @ExaAILabs

ID: 2628093857

linkhttp://alexgajewski.org calendar_today21-06-2014 09:40:04

88 Tweet

2,2K Followers

865 Following

Raffi Hotter (@raffi_hotter) 's Twitter Profile Photo

Why are MRIs so damn heavy? They weigh over 10,000 pounds, or about 3 cars' worth. MRI design knowledge is hidden inside large companies, so I tried to work it out from first principles. 🧵 (1/12)

Why are MRIs so damn heavy?

They weigh over 10,000 pounds, or about 3 cars' worth.

MRI design knowledge is hidden inside large companies, so I tried to work it out from first principles. 🧵 (1/12)
Alex Gajewski (@apagajewski) 's Twitter Profile Photo

very interesting talk on the future of mathematics given language models: youtu.be/vYCT7cw0ycw?si… one thing this made me realize is that the next decade or so of math is going to be incredibly fun, as the reasoning models crank through all sorts of new and interesting questions

Alex Gajewski (@apagajewski) 's Twitter Profile Photo

I wonder how much compute it would take to reproduce the entire published set of deep learning knowledge. if you were efficient about it.

Alex Gajewski (@apagajewski) 's Twitter Profile Photo

I wonder if companies training creative models will be more durable than companies training models to perform well-defined tasks. Because it's easier for competitors to hill-climb on the evals for the tasks, but training a creative model requires the founders to have good taste,

Alex Gajewski (@apagajewski) 's Twitter Profile Photo

Has anyone tried “sub-token attention”? Artificially increase the sequence length by including K copies of each token next to each other (say, each linearly projected by a different map), and let the different copies attend to each other. True self-attention :P (And then at the

Alex Gajewski (@apagajewski) 's Twitter Profile Photo

One part of SF Compute we haven’t talked about very much yet is that post-AGI (presumably soon), the models will want to train more models. (Really, people will ask the first models to train more models, or perhaps to solve tasks that would benefit from, say, some custom RL).

Alex Gajewski (@apagajewski) 's Twitter Profile Photo

Very excited for this new cluster. Big enough to train R1, but it's running our combinatorial auction so the prices should be rational

Alex Gajewski (@apagajewski) 's Twitter Profile Photo

Feels like a good time to start a computer control startup. The methods are generally known (RL on top of base models), and it probably doesn't require that much compute, just thoughtful environment design. I would probably start with a text-only representation of websites.

Alex Gajewski (@apagajewski) 's Twitter Profile Photo

I hope that somebody starts a company to make an AI-native smartwatch. It feels to me like the ideal form factor for most of what I want a language model to do.

Alex Gajewski (@apagajewski) 's Twitter Profile Photo

I wonder what you would get if you trained something Cycle-GAN-like between images and music. Probably possible today with the quality of generative models we have!