Mark Saroufim (@marksaroufim) Twitter Tweets • TwiCopy

Mark Saroufim

@marksaroufim

+ Follow

@pytorch dev broadly interested in performance https://t.co/6KJ328JUwv

ID:35473191

linkhttp://marksaroufim.substack.com calendar_today26-04-2009 14:20:43

1,6K Tweets

8,9K Followers

656 Following

Follow People

Yann LeCun

Professor at NYU. Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.

+ Follow

Sebastian Raschka

Machine learning & AI researcher writing at https://t.co/A0tXWzG1p5. LLM research engineer @LightningAI. Previously stats professor at UW-Madison.

+ Follow

hardmaru

Building Collective Intelligence @SakanaAILabs 🧠

+ Follow

AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo follow on Hugging Face: https://t.co/q2Qoey80Gx

+ Follow

Mark Tenenholtz

Head of AI @PredeloHQ. XGBoost peddler, transformer purveyor.

+ Follow

Mark Saroufim

@marksaroufim

3 days ago

llm.cpp was finally published today. It's very much CUDA C++ the good parts.
Code: github.com/gevtushenko/ll…
Talk: youtube.com/watch?v=WiB_3C…
Speakers: twitter.com/g_evtushenko and Jake Hemstad

account_circle

Mark Saroufim

@marksaroufim

1 week ago

dev-discuss.pytorch.org/t/how-to-measu…

account_circle

Really excited to officially release torchtune: a PyTorch-native library for easily fine-tuning LLMs!

Code: github.com/pytorch/torcht…
Blog: pytorch.org/blog/torchtune…
Tutorials: pytorch.org/torchtune/stab…

[1/5]

account_circle

Mark Saroufim

@marksaroufim

3 weeks ago

Got a sneak peek, best triton tutorial I've read so far. Grokked the differences between the triton & CUDA programming model. Gentler than official triton docs and goes into advanced topics like swizzling by the end

Tomorrow Saturday April 13 at noon PST discord.gg/cudamode

account_circle

Mark Saroufim

@marksaroufim

1 month ago

Weird we haven't found better naming conventions for quantization algorithms like 'int4' is vague. That's the weight dtype but it's only applied to some layers or parts of it, accumulation always in fp32, gradient optimizer and activation all different too

thumb_up_off_alt53

chat_bubble_outline0

repeat4

shareShare

account_circle

Mark Saroufim

@marksaroufim

1 month ago

If you're looking to influence PyTorch's roadmap for lower precision dtypes, quantization and sparsity algorithms please leave some feedback on github.com/pytorch-labs/a…

This is from the team that brought you the sam-fast and gpt-fast quantization kernels

account_circle

William Falcon ⚡️

@_willfalcon

1 month ago

Highly recommend this video on writing optimized cuda kernels

by Mark Saroufim from the PyTorch team.

Perf checklist:
- coalesced global memory access
- maximize occupancy
- memory or compute bound
- minimize control divergence
... + 4 other items

youtube.com/watch?v=SGhfUh…

account_circle

Mark Saroufim

@marksaroufim

3 months ago

I've often heard 'I wish PyTorch had more dev internals documentation' when in reality the problem is we have too much. PyTorch is a deep project and it touches on pretty much all aspects of computer science so here are my favorite references

Intro
Christian S. Perone for an overview of…

account_circle

Mark Saroufim

@marksaroufim

3 months ago

On the subject of codegen I also wanna plug

from torch.utils.cpp_extension import load_inline

pass it a cuda kernel as a string and it'll generate the right build scripts for you

thumb_up_off_alt86

chat_bubble_outline0

repeat6

shareShare

account_circle

Andreas Köpf

@neurosp1ke

3 months ago

❤️‍🔥CUDA MODE
Lecture 1: How to profile CUDA in PyTorch

Mark Saroufim lays the foundation: How to build & call a cuda kernel from torch, how to profile it.

Today, Jan 13
12:00 PM PST (Bay Area)
9:00 PM CET (Berlin)

Join us here: discord.gg/rTFYjfzp?event…

account_circle

Mark Saroufim

@marksaroufim

4 months ago

Cuda kernels in google colab!

account_circle

Ashvini Jindal

@akjindal53244

4 months ago

🌟 First time at NeurIPS! 🌟

🚀 Excited to announce that our team 𝑼𝒑𝒂𝒚𝒂 (Ankur pawan rajpoot Ashvini Jindal) secured first rank 🏆 in NeurIPS 𝗟𝗟𝗠 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲: 𝟭 𝗟𝗟𝗠 + 𝟭𝗚𝗣𝗨 + 𝟭𝗗𝗮𝘆: llm-efficiency-challenge.github.io organized by…