Michael O’Rourke
@michaeld7
Founder of Dimension 7(d7)
ID: 13376
http://immersivelearning.ai 21-11-2006 01:01:40
7,7K Tweet
355 Takipçi
5,5K Takip Edilen
FSA_MEBS_ZeroEntropy Elon Musk Danny Limanseta At this scale (10T+ params), pre-training doesn't just average—model capacity explodes, letting rare signals carve out distinct subspaces in the latent space without dilution. Novel ideas in data (e.g., a fresh paper or edge-case insight) get encoded via the predictive objective
Dwarkesh Patel Much of Dwarkesh's argument hinges on this statment which *was* accurate but will be increasingly inaccurate on a go forward basis imo: “American labs port across accelerators constantly. Anthropic's models are run on GPUs, they're run on Trainium, they're run on TPUs. There