SoftMax (@datagod_v1) Twitter Tweets • TwiCopy

Marc Lelarge 🌻

5 months ago

Learn 𝗚𝗣𝗨 𝗽𝗿𝗼𝗴𝗿𝗮𝗺𝗺𝗶𝗻𝗴 from the ground up: begin with Numba for low-level control, then progress to Triton to write high-performance kernels in a Python-like language. A hands-on Jupyter notebook to get you started quickly.

thumb_up_off_alt1,1K

chat_bubble_outline12

repeat127

shareShare

Jeff Dean

@jeffdean

4 months ago

Performance Hints Over the years, my colleague Sanjay Ghemawat and I have done a fair bit of diving into performance tuning of various pieces of code. We wrote an internal Performance Hints document a couple of years ago as a way of identifying some general principles and we've

thumb_up_off_alt3,3K

chat_bubble_outline61

repeat478

shareShare

anaum

@anaumghori

4 months ago

some recent reads from this month that I've learned from and that are pretty cool 1. Inside NVIDIA GPUs: Anatomy of high performance matmul kernels aleksagordic.com/blog/matmul 2. Triton Flash Attention Kernel Walkthrough: The Forward Pass nathanchen.me/public/Triton-… 3. This guy

thumb_up_off_alt800

chat_bubble_outline15

repeat110

shareShare

Patrick Collison

@patrickc

3 months ago

This work by Cursor is, I think, the coolest AI breakthrough since GPT-4. (And there are plenty of candidates!) simonwillison.net/2026/Jan/19/sc…

thumb_up_off_alt2,2K

chat_bubble_outline69

repeat232

shareShare

Zihao Ye

@ye_combinator

3 months ago

🚀 MLSys 2026 Contest - NVIDIA Track is LIVE! Registration is now open for the FlashInfer-Bench Challenge! Submit high-performance GPU kernels for cutting-edge LLM architectures on NVIDIA Blackwell GPUs. Three Tracks * MoE (Mixture of Experts) * DSA (Deepseek Sparse Attention)

thumb_up_off_alt292

chat_bubble_outline5

repeat57

shareShare

Damnang2

@damnang2

3 months ago

x.com/i/article/2014…

thumb_up_off_alt1,1K

chat_bubble_outline27

repeat163

shareShare

vLLM

@vllm_project

3 months ago

Nice work, Abhishek Maiti! 🙌 This kind of write-up helps more folks understand the internals and start building. If you try something new after reading, consider upstreaming it to vLLM. We’d love to collaborate.

thumb_up_off_alt578

chat_bubble_outline8

repeat63

shareShare

Gaurav Sen

@gkcs_

3 months ago

This paper on the current state of AI Agents is worth reading. Main points: 1. Add memory to agents. 2. Build agents as loops, not pipelines. 3. Go for RL only after the Agent's behavior is reliable. 4. Specify which tool to use when (don't dump 50 tools into a prompt and

thumb_up_off_alt519

chat_bubble_outline14

repeat73

shareShare

Jino Rohit

@jino_rohit

3 months ago

not having a GPU is not an excuse to learn CUDA/triton. > sign in to platforms like leetgpu/tensara. > make an account > solve challenges, look at other solutions , read docs, get better.

thumb_up_off_alt428

chat_bubble_outline7

repeat45

shareShare

surya

@suryasure05

3 months ago

gold mine of an account - highly recommend checking out his content!

thumb_up_off_alt1,1K

chat_bubble_outline6

repeat59

shareShare

systematic longshort

@systematicls

2 months ago

x.com/i/article/2028…

thumb_up_off_alt6,6K

chat_bubble_outline148

repeat794

shareShare

ΛtΗΛrνΛ🪐

@atharvaxdevs

2 months ago

Bro just reverse engineered how Codex actually works. Worth read !!

thumb_up_off_alt663

chat_bubble_outline4

repeat43

shareShare

vLLM

@vllm_project

2 months ago

Maintaining separate attention kernels for every GPU platform doesn't scale. The vLLM Triton attention backend takes a different approach: ~800 lines of Triton, same source code across NVIDIA, AMD, and Intel GPUs. On H100, it matches state-of-the-art attention performance. On

thumb_up_off_alt564

chat_bubble_outline4

repeat61

shareShare

Ali Taha

@aliestaha

2 months ago

- 230 training runs - 1,623 GPU hours (67 B200 days) - 76 TB of training data - a 2x faster model Every paper said it can't be done. Quantization Aware Distillation made it possible.

thumb_up_off_alt1,1K

chat_bubble_outline19

repeat107

shareShare

AVB

@neural_avb

2 months ago

My timeline feels so blessed today. Full of deep learning maniacs writing cool articles like this. 👀

thumb_up_off_alt544

chat_bubble_outline5

repeat29

shareShare

Avi Chawla

@_avichawla

2 months ago

x.com/i/article/2031…

thumb_up_off_alt1,1K

chat_bubble_outline11

repeat133

shareShare

Joe

@joedab12

a month ago

If you guys aren't following the CEO of Cerebras-you should be. One of the brightest guys in the industry. Great writeup for the technical analysis guys who know nothing about AI. I'm somewhere in the middle and learned a lot here. Andrew Feldman would love to hear your

thumb_up_off_alt1,1K

chat_bubble_outline7

repeat44

shareShare

Victor M

@victormustar

a month ago

Very hyped by the new Cohere Transcribe model 🌍 Works surprisingly well on bad quality audio when the mic doesn't cooperate. 2B params, 14 supported languages and it's Apache 2.0. try the official Hugging Face demo ⬇️

thumb_up_off_alt308

chat_bubble_outline13

repeat30

shareShare

Daily Dose of Data Science

@dailydoseofds_

22 days ago

NVIDIA & Unsloth dropped one of the best practical guides on building RL environments from scratch, filling gaps most tutorials skip. Covers: - Why RL environments matter + how to build them - When RL beats SFT - GRPO & RL best practices - How verifiable rewards & RLVR work

thumb_up_off_alt895

chat_bubble_outline3

repeat129

shareShare