Aaryan Singhal (@aaryansinghal4) Twitter Tweets • TwiCopy

good girl

@goodgirlxsz

5 hours ago

🔥Telegram İfşa

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

The Great American AI Race. I wrote something about how we need a holistic AI effort from academia, industry, and the US government to have the best shot at a freer, better educated, and healthier world in AI. I’m a mega bull on the US and open source AI. Maybe we’re cooking

thumb_up_off_alt85

chat_bubble_outline1

repeat93

shareShare

Avanika Narayan

@avanika15

7 months ago

lfg ❤️🤍💙

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Together AI

@togethercompute

7 months ago

Our latest joint work w/ SandyResearch @ UCSD: training-free acceleration of Diffusion Transformers w/ dynamic sparsity, led by Austin Silveria soham! ⚡️ 3.7x faster video and 1.6x faster image generation while preserving quality! 🧵 Open-source code & CUDA kernels!

thumb_up_off_alt105

chat_bubble_outline2

repeat64

shareShare

Austin Silveria

@austinsilveria

7 months ago

Training-free acceleration of Diffusion Transformers with dynamic sparsity and cross-step attention/MLP deltas--collaboration with soham and Dan Fu! ⚡️ 3.7x faster video and 1.6x faster image generation while preserving quality! 🧵 Open-source code & CUDA kernels!

thumb_up_off_alt25

chat_bubble_outline2

repeat7

shareShare

Dan Fu

@realdanfu

7 months ago

Super excited to share Chipmunk 🐿️- training-free acceleration of diffusion transformers (video, image generation) with dynamic attention & MLP sparsity! Led by Austin Silveria, soham - 3.7x faster video gen, 1.6x faster image gen. Kernels written in TK ⚡️🐱 1/

thumb_up_off_alt52

chat_bubble_outline3

repeat15

shareShare

soham

@sohamgovande

7 months ago

introducing chipmunk—a training-free algorithm making ai video generation 3.7x & image gen 1.6x faster! ⚡️ our kernels for column-sparse attention are 9.3x faster than FlashAttention-3 and column-sparse GEMM is 2.5x faster vs. cuBLAS a thread on the GPU kernel optimizations 🧵

thumb_up_off_alt178

chat_bubble_outline36

repeat42

shareShare

Benjamin F Spector

@bfspector

5 months ago

(1/5) We’ve never enjoyed watching people chop Llamas into tiny pieces. So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel. Megakernels are faster & more humane. Here’s how to treat your Llamas ethically: (Joint

thumb_up_off_alt863

chat_bubble_outline32

repeat142

shareShare

Stuart Sul

@stuart_sul

5 months ago

GPU kernel launches are expensive--so we fused the entire Llama-1B into a single kernel. Very excited to kick off our megakernel framework series with Thunderkittens hazyresearch. More coming soon!

thumb_up_off_alt49

chat_bubble_outline3

repeat9

shareShare

Jordan Juravsky

@jordanjuravsky

5 months ago

We wrote a megakernel! Excited to share how we fused Llama-1B into a single kernel to reach SOTA latency. Check out our blog post and code below!

thumb_up_off_alt64

chat_bubble_outline3

repeat9

shareShare

Simran Arora

@simran_s_arora

5 months ago

~ megakernels ~ are next 🚀 excited to share a single kernel that runs a full llama 1b forwards pass!

thumb_up_off_alt59

chat_bubble_outline0

repeat4

shareShare

Simon Guo 🦝

@simonguozirui

5 months ago

grad students are so GPU poor that we can only launch 1⃣ kernel ... but wait, it is faster!

thumb_up_off_alt99

chat_bubble_outline1

repeat8

shareShare

Andrej Karpathy

@karpathy

5 months ago

So so so cool. Llama 1B batch one inference in one single CUDA kernel, deleting synchronization boundaries imposed by breaking the computation into a series of kernels called in sequence. The *optimal* orchestration of compute and memory is only achievable in this way.

thumb_up_off_alt2,2K

chat_bubble_outline63

repeat299

shareShare

Owen Dugan

@owendugan

5 months ago

A megakernel for Llama!🦙 We built a single kernel for the entire Llama 1B forward pass, enabling >1000 tokens/s on a single H100 and almost 1500 tokens/s on a single B200! Check it out!

thumb_up_off_alt17

chat_bubble_outline0

repeat2

shareShare

Quinn McIntyre

@qamcintyre

5 months ago

PETA should require Megakernels for ethical llama treatment

thumb_up_off_alt21

chat_bubble_outline0

repeat1

shareShare

Austin Silveria

@austinsilveria

5 months ago

chipmunk is up on arxiv! across HunyuanVideo and Flux.1-dev, 5-25% of the intermediate activation values in attention and MLPs account for 70-90% of the change in activations across steps caching + sparsity speeds up generation by only recomputing fast changing activations

thumb_up_off_alt20

chat_bubble_outline1

repeat7

shareShare

soham

@sohamgovande

5 months ago

super fun to work on this :)

thumb_up_off_alt15

chat_bubble_outline0

repeat1

shareShare

Dan Fu

@realdanfu

5 months ago

Some updates to Chipmunk! 🐿️ Chipmunk now supports Wan 2.1, with up to 2.67x speedup - completely training-free! The paper is up on arXiv - take a look to see more in-depth analysis of sparsity in video models. Only 5-25% of activations account for >90% of the output!

thumb_up_off_alt15

chat_bubble_outline1

repeat4

shareShare

Tanvir Bhathal

@bhathaltanvir0

4 months ago

Super excited to announce Weaver! Check it out to see the strongest way to verify LM Generations while maintaining compute efficiency!

thumb_up_off_alt48

chat_bubble_outline5

repeat22

shareShare

Brendan McLaughlin

@brendanm0407

4 months ago

Thrilled to share that I’ve joined Reflection AI! We’re building superintelligent autonomous systems by co-designing research and product. Today, we’re launching Asimov. As AI benchmarks saturate, evaluation will increasingly live inside real-world products that are

thumb_up_off_alt64

chat_bubble_outline15

repeat10

shareShare

Robby Manihani

@robbymanihani

3 months ago

Today we're announcing Pace, where we are building the world's first Agent Process Outsourcer for insurance operations. Traditional industry runs on legacy BPOs and consultants, and we're reimagining it. Our agent can handle documents of any length, conduct complex reasoning, and

thumb_up_off_alt8

chat_bubble_outline2

repeat4

shareShare