Owen Dugan (@owendugan) 's Twitter Profile
Owen Dugan

@owendugan

CS Ph.D. student @Stanford. Previously @MIT.

@HertzFoundation, @KnightHennessy

ID: 1542654374786338816

linkhttp://druidowm.github.io calendar_today30-06-2022 23:40:45

63 Tweet

196 Followers

213 Following

Benjamin F Spector (@bfspector) 's Twitter Profile Photo

(1/6) Joyously announcing ThunderKittens with real support on NVIDIA Blackwell! We've released BF16/FP8 GEMM and attention fwd+bwd kernels, up to 2x faster than cuBLAS GEMMs on H100. Blog: bit.ly/41tuT4Q With Dan Fu, Aaryan Singhal, and @hazyresearch!

Simran Arora (@simran_s_arora) 's Twitter Profile Photo

BASED ✌️ turns 1! One year since its launch at NeurIPS 2023 — and it's helped shape the new wave of efficient LMs. ⚡️ Fastest linear attention kernels 🧠 405B models trained on 16 GPUs 💥 Inspired Mamba-v2, RWKVs, MiniMax Checkout our retrospective below!

hazyresearch (@hazyresearch) 's Twitter Profile Photo

The Great American AI Race. I wrote something about how we need a holistic AI effort from academia, industry, and the US government to have the best shot at a freer, better educated, and healthier world in AI. I’m a mega bull on the US and open source AI. Maybe we’re cooking

The Great American AI Race. I wrote something about how we need a holistic AI effort from academia, industry, and the US government to have the best shot at a freer, better educated, and healthier world in AI. I’m a mega bull on the US and open source AI. Maybe we’re cooking
Jerry Liu (@jerrywliu) 's Twitter Profile Photo

I'm at #ICLR2025 this week to present our work on🔬high-precision algorithm learning🔬with Transformers! Stop by our poster session Thursday afternoon! 🔗arxiv.org/abs/2503.12295 With Jess Grogan, Owen Dugan , Ashish Rao, Simran Arora , Atri Rudra, and hazyresearch!

I'm at #ICLR2025 this week to present our work on🔬high-precision algorithm learning🔬with Transformers! Stop by our poster session Thursday afternoon!
🔗arxiv.org/abs/2503.12295
With <a href="/Jessica_Grogan_/">Jess Grogan</a>, <a href="/OwenDugan/">Owen Dugan</a> , Ashish Rao, <a href="/simran_s_arora/">Simran Arora</a> , Atri Rudra, and <a href="/HazyResearch/">hazyresearch</a>!
Roberto Garcia (@garctrob) 's Twitter Profile Photo

I'm at #ICLR2025 presenting RaNA 🐸, an adaptive compression method to speed up Transformers, with Jerry Liu , Sabri Eyuboglu , and others! 🔗arxiv.org/abs/2503.18216 RaNA is based on the Adaptive Rank Allocation framework, which generalizes prior neuron-adaptive methods,

I'm at #ICLR2025 presenting RaNA 🐸, an adaptive compression method to speed up Transformers, with <a href="/jerrywliu/">Jerry Liu</a> , <a href="/EyubogluSabri/">Sabri Eyuboglu</a> , and others! 🔗arxiv.org/abs/2503.18216

RaNA is based on the Adaptive Rank Allocation framework, which generalizes prior neuron-adaptive methods,
Avanika Narayan (@avanika15) 's Twitter Profile Photo

can you chat privately with a cloud llm—*without* sacrificing speed? excited to release minions secure chat: an open-source protocol for end-to-end encrypted llm chat with <1% latency overhead (even @ 30B+ params!). cloud providers can’t peek—messages decrypt only inside a

Dan Biderman (@dan_biderman) 's Twitter Profile Photo

We secure all communications with a cloud-hosted LLM, running on an H100 in confidential mode. Latency overhead goes away once you cross the 10B model size. This is our first foray into applied cryptography -- help us refine our ideas.

We secure all communications with a cloud-hosted LLM, running on an H100 in confidential mode. 

Latency overhead goes away once you cross the 10B model size. 

This is our first foray into applied cryptography -- help us refine our ideas.
Benjamin F Spector (@bfspector) 's Twitter Profile Photo

(1/5) We’ve never enjoyed watching people chop Llamas into tiny pieces. So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel. Megakernels are faster & more humane. Here’s how to treat your Llamas ethically: (Joint

(1/5) We’ve never enjoyed watching people chop Llamas into tiny pieces.

So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel.

Megakernels are faster &amp; more humane. Here’s how to treat your Llamas ethically:

(Joint
Jordan Juravsky (@jordanjuravsky) 's Twitter Profile Photo

Happy Throughput Thursday! We’re excited to release Tokasaurus: an LLM inference engine designed from the ground up for high-throughput workloads with large and small models. (Joint work with Ayush Chakravarthy, Ryan Ehrlich, Sabri Eyuboglu, Bradley Brown, Joseph Shetaye,

Happy Throughput Thursday! We’re excited to release Tokasaurus: an LLM inference engine designed from the ground up for high-throughput workloads with large and small models.

(Joint work with <a href="/achakravarthy01/">Ayush Chakravarthy</a>, <a href="/ryansehrlich/">Ryan Ehrlich</a>, <a href="/EyubogluSabri/">Sabri Eyuboglu</a>, <a href="/brad19brown/">Bradley Brown</a>, <a href="/jshetaye/">Joseph Shetaye</a>,
Sabri Eyuboglu (@eyuboglusabri) 's Twitter Profile Photo

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size. What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size.

What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x
Jon Saad-Falcon (@jonsaadfalcon) 's Twitter Profile Photo

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 
🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning
Tilde (@tilderesearch) 's Twitter Profile Photo

Sparse attention (MoBA/NSA) trains faster & beats full attention in key tasks. But we’ve had no idea how they truly work…until now. 🔍 We reverse-engineered them to uncover: - Novel attention patterns - Hidden "attention sinks" - Better performance - And more A 🧵… ~1/8~

Jerry Liu (@jerrywliu) 's Twitter Profile Photo

1/10 ML can solve PDEs – but precision🔬is still a challenge. Towards high-precision methods for scientific problems, we introduce BWLer 🎳, a new architecture for physics-informed learning achieving (near-)machine-precision (up to 10⁻¹² RMSE) on benchmark PDEs. 🧵How it works:

Jordan Juravsky (@jordanjuravsky) 's Twitter Profile Photo

Check out Tokasaurus on Modal to make Llama-1B brrr! This repeated sampling example shows off two engine features that are important for serving small models: very low CPU overhead and automatic shared prefix exploitation with Hydragen.

Michael Poli (@michaelpoli6) 's Twitter Profile Photo

Life update: I started Radical Numerics with Stefano Massaroli, Armin Thomas, Eric Nguyen, and a fantastic team of engineers and researchers. We are building the engine for recursive self‑improvement (RSI): AI that designs and refines AI, accelerating discovery across science and

Life update: I started Radical Numerics with Stefano Massaroli, Armin Thomas, Eric Nguyen, and a fantastic team of engineers and researchers. We are building the engine for recursive self‑improvement (RSI): AI that designs and refines AI, accelerating discovery across science and
Eric Nguyen (@exnx) 's Twitter Profile Photo

✨ Excited to share a few life updates! 🎤 My TED Talk is now live! I shared the origin story of Evo, titled: "How AI could generate new life forms" TED talk: ted.com/talks/eric_ngu… ✍️ I wrote a blog post about what it’s *really* like to deliver a TED talk blog: