Matej Sirovatka (@m_sirovatka) 's Twitter Profile
Matej Sirovatka

@m_sirovatka

MLE @HuggingFace

ID: 1427334810238869508

calendar_today16-08-2021 18:22:23

32 Tweet

61 Takipçi

94 Takip Edilen

Matej Sirovatka (@m_sirovatka) 's Twitter Profile Photo

GPU Mode all over the globe, this time in NYC with an amazing speaker list and a very cool hackathon track, courtesy of Jane Street! See you there 🫡

Matej Sirovatka (@m_sirovatka) 's Twitter Profile Photo

We’re doing GPU code generation for stuff GPU MODE and we had issues scaling concurrency. Yesterday Simon Guo told me to use Modal for rollout eval. Hacked it together in like 30 minutes and concurrency is not an issue 🫡 Charles 🎉 Frye any plans to get a cheaper plan

Zach Mueller (@thezachmueller) 's Twitter Profile Photo

Distributed training has its own dialect. I made a pocket dictionary so you don’t open 50 browser tabs every time a paper mentions “ZeRO-offload.” 49 terms, crisp definitions, diagrams where they actually help. Grab it, skim it, get back to training. distributedlexicon(.)com

Distributed training has its own dialect.
I made a pocket dictionary so you don’t open 50 browser tabs every time a paper mentions “ZeRO-offload.”

49 terms, crisp definitions, diagrams where they actually help.
Grab it, skim it, get back to training.

distributedlexicon(.)com
Nouamane Tazi (@nouamanetazi) 's Twitter Profile Photo

🚀 Expert Parallelism now in 🤗 Transformers! Load the 120B gpt-oss model in under 𝟑 𝐬𝐞𝐜𝐨𝐧𝐝𝐬. Proud to have added Expert Parallelism to transformers for the OpenAI model by adding DistributedConfig as a first step to generalize the TP support. 🔗 github.com/huggingface/tr…

🚀 Expert Parallelism now in 🤗 Transformers! Load the 120B gpt-oss model in under 𝟑 𝐬𝐞𝐜𝐨𝐧𝐝𝐬.

Proud to have added Expert Parallelism to transformers for the OpenAI model by adding DistributedConfig as a first step to generalize the TP support.

🔗 github.com/huggingface/tr…
Matej Sirovatka (@m_sirovatka) 's Twitter Profile Photo

Any of my gpu middle class mutuals have a legit recipe to get B200 gpt-oss running locally, I tried and received a bunch of “subprocess exited with error code 1” in triton.jit and I’m too tired to fix that 😭

Nouamane Tazi (@nouamanetazi) 's Twitter Profile Photo

📢 The Ultra-Scale Playbook is now available in print! 📚A deep dive into training Large Language Models efficiently on GPU clusters — from fundamentals to advanced parallelism. Order here 👉 lulu.com/shop/nouamane-…