Matej Sirovatka (@m_sirovatka) Twitter Tweets • TwiCopy

We’re doing GPU code generation for stuff GPU MODE and we had issues scaling concurrency. Yesterday Simon Guo told me to use Modal for rollout eval. Hacked it together in like 30 minutes and concurrency is not an issue 🫡 Charles 🎉 Frye any plans to get a cheaper plan

thumb_up_off_alt108

chat_bubble_outline7

repeat5

shareShare

Zach Mueller

@thezachmueller

5 months ago

Distributed training has its own dialect. I made a pocket dictionary so you don’t open 50 browser tabs every time a paper mentions “ZeRO-offload.” 49 terms, crisp definitions, diagrams where they actually help. Grab it, skim it, get back to training. distributedlexicon(.)com

thumb_up_off_alt589

chat_bubble_outline10

repeat44

shareShare

Matej Sirovatka

@m_sirovatka

5 months ago

cathedrals are everywhere for those with eyes to see v2.0

thumb_up_off_alt8

chat_bubble_outline1

repeat0

shareShare

Matej Sirovatka

@m_sirovatka

5 months ago

looking forward to Prime Intellect training a model using decentralized context parallelism

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

Matej Sirovatka

@m_sirovatka

5 months ago

I consider voting for anything else as 4.0 as a redflag, I will not elaborate further

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Nouamane Tazi

@nouamanetazi

4 months ago

🚀 Expert Parallelism now in 🤗 Transformers! Load the 120B gpt-oss model in under 𝟑 𝐬𝐞𝐜𝐨𝐧𝐝𝐬. Proud to have added Expert Parallelism to transformers for the OpenAI model by adding DistributedConfig as a first step to generalize the TP support. 🔗 github.com/huggingface/tr…

thumb_up_off_alt181

chat_bubble_outline8

repeat28

shareShare

Matej Sirovatka

@m_sirovatka

4 months ago

Any of my gpu middle class mutuals have a legit recipe to get B200 gpt-oss running locally, I tried and received a bunch of “subprocess exited with error code 1” in triton.jit and I’m too tired to fix that 😭

thumb_up_off_alt23

chat_bubble_outline2

repeat0

shareShare

Quentin Gallouédec

@qgallouedec

4 months ago

📺

thumb_up_off_alt34

chat_bubble_outline1

repeat3

shareShare

Nouamane Tazi

@nouamanetazi

4 months ago

📢 The Ultra-Scale Playbook is now available in print! 📚A deep dive into training Large Language Models efficiently on GPU clusters — from fundamentals to advanced parallelism. Order here 👉 lulu.com/shop/nouamane-…

thumb_up_off_alt70

chat_bubble_outline2

repeat12

shareShare