We’re doing GPU code generation for stuff GPU MODE and we had issues scaling concurrency. Yesterday Simon Guo told me to use Modal for rollout eval.
Hacked it together in like 30 minutes and concurrency is not an issue 🫡
Charles 🎉 Frye any plans to get a cheaper plan
Distributed training has its own dialect.
I made a pocket dictionary so you don’t open 50 browser tabs every time a paper mentions “ZeRO-offload.”
49 terms, crisp definitions, diagrams where they actually help.
Grab it, skim it, get back to training.
distributedlexicon(.)com
🚀 Expert Parallelism now in 🤗 Transformers! Load the 120B gpt-oss model in under 𝟑 𝐬𝐞𝐜𝐨𝐧𝐝𝐬.
Proud to have added Expert Parallelism to transformers for the OpenAI model by adding DistributedConfig as a first step to generalize the TP support.
🔗 github.com/huggingface/tr…
Any of my gpu middle class mutuals have a legit recipe to get B200 gpt-oss running locally, I tried and received a bunch of “subprocess exited with error code 1” in triton.jit and I’m too tired to fix that 😭
📢 The Ultra-Scale Playbook is now available in print!
📚A deep dive into training Large Language Models efficiently on GPU clusters — from fundamentals to advanced parallelism.
Order here
👉 lulu.com/shop/nouamane-…