Aditya Kane (@adityakane1) Twitter Tweets • TwiCopy

Humphrey Shi

6 months ago

We are releasing a major NATTEN upgrade that brings you new Hopper & Blackwell sparse attention kernels, both capable of realizing Theoretical Max Speedup: 90% sparsity -> 10X speedup. Thanks to the great efforts by Ali Hassani & @NVIDIA cutlass team! natten.org

thumb_up_off_alt25

chat_bubble_outline0

repeat4

shareShare

Yutori

@yutori_ai

6 months ago

thumb_up_off_alt31

chat_bubble_outline1

repeat3

shareShare

Sharon Goldman

@sharongoldman

6 months ago

Exclusive: Ex-Meta AI leaders debut an agent that scours the web for you in a push to ultimately give users their own digital ‘chief of staff’ Devi Parikh Abhishek Das @DhruvBatraDB fortune.com/2025/06/10/exc…

thumb_up_off_alt28

chat_bubble_outline0

repeat8

shareShare

Abhishek Das

@abhshkdz

6 months ago

We're excited to launch Scouts — always-on AI agents that monitor the web for anything you care about.

thumb_up_off_alt2,2K

chat_bubble_outline134

repeat174

shareShare

Devi Parikh

@deviparikh

6 months ago

It's here! Introducing Scouts by Yutori. Scouts is like having a team of agents monitoring the web for information that matters to you. We're letting more users in everyday. Join the waitlist!

thumb_up_off_alt196

chat_bubble_outline8

repeat16

shareShare

Dhruv Batra

@dhruvbatradb

6 months ago

Scouts by Yutori. AI agents that monitor the web for things you care about. So you can focus on the meaningful things in life and experience a bit more yutori.

thumb_up_off_alt94

chat_bubble_outline5

repeat10

shareShare

Aditya Kane

@adityakane1

6 months ago

Scouts from Yutori keep tabs on things you care about so you never miss an important update! An update like AMD securing a big win in the AI chip market :)

Scouts from <a href="/yutori_ai/">Yutori</a> keep tabs on things you care about so you never miss an important update!

An update like AMD securing a big win in the AI chip market :)

thumb_up_off_alt9

chat_bubble_outline0

repeat2

shareShare

Stas Bekman

@stasbekman

6 months ago

Oh wow, the newly released nccl finally started to use fp32 accumulation for reduction ops with half precision inputs! This is so important! Thank you NCCL team! github.com/NVIDIA/nccl/co… I'd imagine we will see this version in pytorch>=2.8 I hope, unless you build your own nccl.

thumb_up_off_alt137

chat_bubble_outline1

repeat12

shareShare

cloud

@cloud11665

6 months ago

the "compute bound" workload in question

thumb_up_off_alt58

chat_bubble_outline3

repeat3

shareShare

tenderizzation

@tenderizzation

5 months ago

hitting ^C^C^C^C^C^C^C^C to get the shell back after the 73rd torchrun run in the debugging session crashes

thumb_up_off_alt416

chat_bubble_outline10

repeat9

shareShare

Ali Hassani

@alihassanijr

5 months ago

Cosmos-Predict2 meets NATTEN. We just released variants of Cosmos-Predict2 where we replace most self attentions with neighborhood attention, bringing up to 2.6X end-to-end speedup, with minimal effect on quality! github.com/nvidia-cosmos/… (1/5)

thumb_up_off_alt38

chat_bubble_outline1

repeat8

shareShare

Humphrey Shi

@humphrey_shi

5 months ago

Sparse Attention is now pushing World Foundation Models to the Speed of Light! Attention powers modern AI (Transformers, ViTs, DiTs), and Sparse Attention is the next frontier. Neighborhood Attention (NA) is the first multidimensional sparse attention infrastructure that: -

thumb_up_off_alt170

chat_bubble_outline1

repeat22

shareShare

Ali Hassani

@alihassanijr

4 months ago

Watch my talk about NATTEN on GPU MODE this Saturday at 3PM ET / noon PT. I'll go over all the exciting new features we shipped very recently, especially our Hopper and Blackwell FNA kernels, now speeding up video / world models by up to 2.6X e2e! youtube.com/watch?v=mF_H_J

thumb_up_off_alt26

chat_bubble_outline1

repeat6

shareShare

Humphrey Shi

@humphrey_shi

4 months ago

Check out Ali's talk tomorrow on GPU MODE if you breathe GPUs! This is our 2nd GPU MODE talk — last time we unveiled Distributed GEMM: a CUTLASS-based Tensor Parallelism implementation that helps push NVL-based AI systems to the next level - transforming a network of GPUs into

Check out Ali's talk tomorrow on <a href="/GPU_MODE/">GPU MODE</a> if you breathe GPUs!
This is our 2nd GPU MODE talk — last time we unveiled Distributed GEMM: a CUTLASS-based Tensor Parallelism implementation that helps push NVL-based AI systems to the next level - transforming a network of GPUs into

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

FFmpeg

@ffmpeg

4 months ago

Not sure why but we have lots of new followers! FFmpeg makes extensive use of hand-written assembly code for huge (10-50x) speed increases so we are providing assembly lessons to teach a new generation of assembly language programmers. Learn more here: github.com/FFmpeg/asm-les…

thumb_up_off_alt9,9K

chat_bubble_outline176

repeat751

shareShare