Fatma Tarlaci
@coderphd
CAIO @soaruplift | PhD | Adj. Asst. Prof. @utcompsci | Former DL Scholar @OpenAI | CS @StanfordEng | ❤️ Dogs & Lifting Weights
ID: 91510093
https://www.linkedin.com/in/fatmatarlaci/ 21-11-2009 05:26:47
587 Tweet
1,1K Followers
2,2K Following
Remember the llm.c repro of the GPT-2 (124M) training run? It took 45 min on 8xH100. Since then, Keller Jordan (and by now many others) have iterated on that extensively in the new modded-nanogpt repo that achieves the same result, now in only 5 min! Love this repo 👏 600 LOC
I don't have too too much to add on top of this earlier post on V3 and I think it applies to R1 too (which is the more recent, thinking equivalent). I will say that Deep Learning has a legendary ravenous appetite for compute, like no other algorithm that has ever been developed
TL;DR: we are excited to release a powerful new open-weight language model with reasoning in the coming months, and we want to talk to devs about how to make it maximally useful: openai.com/open-model-fee… we are excited to make this a very, very good model! __ we are planning to