SoftMax (@datagod_v1) 's Twitter Profile
SoftMax

@datagod_v1

In God We trust all others must bring data and memes

ID: 1537330480139317248

calendar_today16-06-2022 07:05:34

525 Tweet

413 Followers

1,1K Following

Marc Lelarge ๐ŸŒป (@marc_lelarge) 's Twitter Profile Photo

Learn ๐—š๐—ฃ๐—จ ๐—ฝ๐—ฟ๐—ผ๐—ด๐—ฟ๐—ฎ๐—บ๐—บ๐—ถ๐—ป๐—ด from the ground up: begin with Numba for low-level control, then progress to Triton to write high-performance kernels in a Python-like language. A hands-on Jupyter notebook to get you started quickly.

Learn ๐—š๐—ฃ๐—จ ๐—ฝ๐—ฟ๐—ผ๐—ด๐—ฟ๐—ฎ๐—บ๐—บ๐—ถ๐—ป๐—ด from the ground up: begin with Numba for low-level control, then progress to Triton to write high-performance kernels in a Python-like language. A hands-on Jupyter notebook to get you started quickly.
Jeff Dean (@jeffdean) 's Twitter Profile Photo

Performance Hints Over the years, my colleague Sanjay Ghemawat and I have done a fair bit of diving into performance tuning of various pieces of code. We wrote an internal Performance Hints document a couple of years ago as a way of identifying some general principles and we've

Performance Hints

Over the years, my colleague Sanjay Ghemawat and I have done a fair bit of diving into performance tuning of various pieces of code.  We wrote an internal Performance Hints document a couple of years ago as a way of identifying some general principles and we've
anaum (@anaumghori) 's Twitter Profile Photo

some recent reads from this month that I've learned from and that are pretty cool 1. Inside NVIDIA GPUs: Anatomy of high performance matmul kernels aleksagordic.com/blog/matmul 2. Triton Flash Attention Kernel Walkthrough: The Forward Pass nathanchen.me/public/Triton-โ€ฆ 3. This guy

Patrick Collison (@patrickc) 's Twitter Profile Photo

This work by Cursor is, I think, the coolest AI breakthrough since GPT-4. (And there are plenty of candidates!) simonwillison.net/2026/Jan/19/scโ€ฆ

Zihao Ye (@ye_combinator) 's Twitter Profile Photo

๐Ÿš€ MLSys 2026 Contest - NVIDIA Track is LIVE! Registration is now open for the FlashInfer-Bench Challenge! Submit high-performance GPU kernels for cutting-edge LLM architectures on NVIDIA Blackwell GPUs. Three Tracks * MoE (Mixture of Experts) * DSA (Deepseek Sparse Attention)

vLLM (@vllm_project) 's Twitter Profile Photo

Nice work, Abhishek Maiti! ๐Ÿ™Œ This kind of write-up helps more folks understand the internals and start building. If you try something new after reading, consider upstreaming it to vLLM. Weโ€™d love to collaborate.

Gaurav Sen (@gkcs_) 's Twitter Profile Photo

This paper on the current state of AI Agents is worth reading. Main points: 1. Add memory to agents. 2. Build agents as loops, not pipelines. 3. Go for RL only after the Agent's behavior is reliable. 4. Specify which tool to use when (don't dump 50 tools into a prompt and

Jino Rohit (@jino_rohit) 's Twitter Profile Photo

not having a GPU is not an excuse to learn CUDA/triton. > sign in to platforms like leetgpu/tensara. > make an account > solve challenges, look at other solutions , read docs, get better.

not having a GPU is not an excuse to learn CUDA/triton.

> sign in to platforms like leetgpu/tensara.
> make an account
> solve challenges, look at other solutions , read docs, get better.
vLLM (@vllm_project) 's Twitter Profile Photo

Maintaining separate attention kernels for every GPU platform doesn't scale. The vLLM Triton attention backend takes a different approach: ~800 lines of Triton, same source code across NVIDIA, AMD, and Intel GPUs. On H100, it matches state-of-the-art attention performance. On

Maintaining separate attention kernels for every GPU platform doesn't scale.

The vLLM Triton attention backend takes a different approach: ~800 lines of Triton, same source code across NVIDIA, AMD, and Intel GPUs. On H100, it matches state-of-the-art attention performance. On
Ali Taha (@aliestaha) 's Twitter Profile Photo

- 230 training runs - 1,623 GPU hours (67 B200 days) - 76 TB of training data - a 2x faster model Every paper said it can't be done. Quantization Aware Distillation made it possible.

Joe (@joedab12) 's Twitter Profile Photo

If you guys aren't following the CEO of Cerebras-you should be. One of the brightest guys in the industry. Great writeup for the technical analysis guys who know nothing about AI. I'm somewhere in the middle and learned a lot here. Andrew Feldman would love to hear your

Victor M (@victormustar) 's Twitter Profile Photo

Very hyped by the new Cohere Transcribe model ๐ŸŒ Works surprisingly well on bad quality audio when the mic doesn't cooperate. 2B params, 14 supported languages and it's Apache 2.0. try the official Hugging Face demo โฌ‡๏ธ

Daily Dose of Data Science (@dailydoseofds_) 's Twitter Profile Photo

NVIDIA & Unsloth dropped one of the best practical guides on building RL environments from scratch, filling gaps most tutorials skip. Covers: - Why RL environments matter + how to build them - When RL beats SFT - GRPO & RL best practices - How verifiable rewards & RLVR work

NVIDIA & Unsloth dropped one of the best practical guides on building RL environments from scratch, filling gaps most tutorials skip.

Covers:

- Why RL environments matter + how to build them
- When RL beats SFT
- GRPO & RL best practices
- How verifiable rewards & RLVR work