Sylvain Gugger (@guggersylvain) 's Twitter Profile
Sylvain Gugger

@guggersylvain

Machine Learning at Jane Street. Previously at @huggingface and @fastdotai Co-author of github.com/fastai/fastbook He/him

ID: 976897777589456897

linkhttp://sgugger.github.io calendar_today22-03-2018 19:05:54

1,1K Tweet

25,25K Takipçi

350 Takip Edilen

Horace He (@chhillee) 's Twitter Profile Photo

For too long, users have lived under the software lottery tyranny of fused attention implementations. No longer. Introducing FlexAttention, a new PyTorch API allowing for many attention variants to enjoy fused kernels in a few lines of PyTorch. pytorch.org/blog/flexatten… 1/10

For too long, users have lived under the software lottery tyranny of fused attention implementations. 

No longer. 

Introducing FlexAttention, a new PyTorch API allowing for many attention variants to enjoy fused kernels in a few lines of PyTorch.
pytorch.org/blog/flexatten…
1/10
Charlie Marsh (@charliermarsh) 's Twitter Profile Photo

Today, we're shipping a series of features that move uv beyond a pip alternative, and into an end-to-end solution for managing Python projects, command-line tools, single-file scripts, and even Python itself. A single, unified tool. Like Cargo, for Python. It's very fast.

Today, we're shipping a series of features that move uv beyond a pip alternative, and into an end-to-end solution for managing Python projects, command-line tools, single-file scripts, and even Python itself.

A single, unified tool. Like Cargo, for Python.

It's very fast.
Marc Sun (@_marcsun) 's Twitter Profile Photo

Happy to share that we are (pre)releasing Accelerate V1.0.0! 🔥 It's been an incredible journey since I joined the accelerate team 1.5 years ago, and there's plenty of exciting updates on the way. Learn more about this milestone here: huggingface.co/blog/accelerat…

Happy to share that we are (pre)releasing Accelerate V1.0.0! 🔥
It's been an incredible journey since I joined the accelerate team 1.5 years ago, and there's plenty of exciting updates on the way. 
Learn more about this milestone here: huggingface.co/blog/accelerat…
Yaron (Ron) Minsky (@yminsky) 's Twitter Profile Photo

A new Signals and Threads! This one is an interview with the great Sylvain Gugger, all about making GPUs go brrr... signalsandthreads.com/the-uncertain-…

Sylvain Gugger (@guggersylvain) 's Twitter Profile Photo

I had a lot of fun talking with Yaron (Ron) Minsky about GPU performance (go brrr!) and the common pitfalls to avoid. signalsandthreads.com/the-uncertain-…

PyTorch (@pytorch) 's Twitter Profile Photo

PyTorch 2.5 is here 🔥 We are excited to announce the release of #PyTorch 2.5, featuring a new CuDNN backend for SDPA, regional compilation of torch.compile, & TorchInductor CPP backend performance speedup Read more in our blog: hubs.la/Q02TRs9p0

PyTorch 2.5 is here 🔥 We are excited to announce the release of #PyTorch 2.5, featuring a new CuDNN backend for SDPA, regional compilation of torch.compile, & TorchInductor CPP backend performance speedup 

Read more in our blog: hubs.la/Q02TRs9p0
Horace He (@chhillee) 's Twitter Profile Photo

Jane Street tech talks have always been super awesome. So I'm quite excited to be visiting Jane Street on Monday to give a talk on building ML systems for a trillion trillion FLOPs :) I'll talk about a bunch of fun things, including cool GPU optimizations, how I think about

Sylvain Gugger (@guggersylvain) 's Twitter Profile Photo

We had an awesome talk at Jane Street from the amazing Horace He on scaling ML systems to and I just realized the recording is now online: youtu.be/139UPjoq7Kw?si…

Stas Bekman (@stasbekman) 's Twitter Profile Photo

This is huge, huge, huge - DeepSpeed is now a community-owned project as it's now a part of the Linux Foundation. Committer access should be possible now. Thank you, Microsoft Research for breathing life into this very important to the ML community scalability framework and now

GPU MODE (@gpu_mode) 's Twitter Profile Photo

Write a fast kernel and run it on Discord. See how you compare against the best! If you're familiar with Leetcode, Kaggle or Codeforces then this should feel right at home

Write a fast kernel and run it on Discord. See how you compare against the best!

If you're familiar with Leetcode, Kaggle or Codeforces then this should feel right at home
Benjamin F Spector (@bfspector) 's Twitter Profile Photo

(1/7) Inspired by DeepSeek's FlashMLA, we're releasing ThunderMLA—a fused megakernel optimized for variable-prompt decoding! ⚡️🐱ThunderMLA is up to 35% faster than FlashMLA and just 400 LoC. Blog: bit.ly/4kubAAK With Aaryan Singhal, Dan Fu, and @hazyresearch!

(1/7) Inspired by DeepSeek's FlashMLA, we're releasing ThunderMLA—a fused megakernel optimized for variable-prompt decoding! ⚡️🐱ThunderMLA is up to 35% faster than FlashMLA and just 400 LoC.

Blog: bit.ly/4kubAAK
With <a href="/AaryanSinghal4/">Aaryan Singhal</a>, <a href="/realDanFu/">Dan Fu</a>, and @hazyresearch!
João Gante (@joao_gante) 's Twitter Profile Photo

Speculative Decoding before: limited choices, the draft model must have the same tokenizer 😬 Speculative Decoding now: unlimited choices, ANY draft model can be used and better speedup opportunities 😎 The folks at Intel have been cooking, and Speculative Decoding (with

Speculative Decoding before: limited choices, the draft model must have the same tokenizer 😬
Speculative Decoding now: unlimited choices, ANY draft model can be used and better speedup opportunities 😎

The folks at Intel have been cooking, and Speculative Decoding (with
Vijay (@__tensorcore__) 's Twitter Profile Photo

🚨🔥 CUTLASS 4.0 is released 🔥🚨 pip install nvidia-cutlass-dsl 4.0 marks a major shift for CUTLASS: towards native GPU programming in Python slidehelloworld.png docs.nvidia.com/cutlass/media/…

🚨🔥 CUTLASS 4.0 is released 🔥🚨

pip install nvidia-cutlass-dsl

4.0 marks a major shift for CUTLASS: towards native GPU programming in Python

slidehelloworld.png

docs.nvidia.com/cutlass/media/…