Tri Dao (@tri_dao) Twitter Tweets • TwiCopy

Infini-AI-Lab

7 months ago

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

thumb_up_off_alt207

chat_bubble_outline2

repeat76

shareShare

Inception Labs

@inceptionailabs

6 months ago

We’re excited to launch Mercury, the first commercial-scale diffusion LLM tailored for chat applications! Ultra-fast and efficient, Mercury brings real-time responsiveness to conversations, just like Mercury Coder did for code.

thumb_up_off_alt294

chat_bubble_outline14

repeat45

shareShare

Tri Dao

@tri_dao

6 months ago

Crazy that we now have an open source model with 13B params that’s competitive w o1. And Mamba layers help bring much higher inference throughput

thumb_up_off_alt611

chat_bubble_outline23

repeat53

shareShare

Together AI

@togethercompute

6 months ago

Announcing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. Built in

thumb_up_off_alt466

chat_bubble_outline9

repeat76

shareShare

AI21 Labs

@ai21labs

6 months ago

Now live. A new update to our Jamba open model family 🎉 Same hybrid SSM-Transformer architecture, 256K context window, efficiency gains & open weights. Now with improved grounding & instruction following. Try it on AI21 Studio or download from Hugging Face 🤗 More on what

thumb_up_off_alt164

chat_bubble_outline2

repeat25

shareShare

Tri Dao

@tri_dao

6 months ago

Turns out you can do length generalization for recurrent model by simply training for another extra 100 steps with a careful choice of initial states

thumb_up_off_alt222

chat_bubble_outline2

repeat18

shareShare

Tri Dao

@tri_dao

6 months ago

Albert articulates really well the trade offs between transformers and SSMs. This is why I work on both

thumb_up_off_alt166

chat_bubble_outline2

repeat15

shareShare

Liliang Ren

@liliang_ren

6 months ago

Reasoning can be made much, much faster—with fundamental changes in neural architecture. 😮 Introducing Phi4-mini-Flash-Reasoning: a 3.8B model that surpasses Phi4-mini-Reasoning on major reasoning tasks (AIME24/25, MATH500, GPQA-D), while delivering up-to 10× higher throughput

thumb_up_off_alt359

chat_bubble_outline2

repeat71

shareShare

Aaron Gokaslan

@skyli0n

6 months ago

New smaller Flash Attention binaries releasing tonight when the build finishes in 8 hours!

thumb_up_off_alt35

chat_bubble_outline4

repeat1

shareShare

Nadav Schneider

@nadavsch

6 months ago

Introducing Diff-Mamba! 🧠🔥 Differential design has been shown to reduce over-allocation of attention to irrelevant context in Transformers—improving robustness, ICL, retrieval, and long-context capabilities. Can it be effectively applied to Mamba? Answers in the thread🧵👇

thumb_up_off_alt102

chat_bubble_outline1

repeat23

shareShare

Mayank Mishra

@mayankmish98

6 months ago

🦆QuACK: blazing fast cute-DSL GPU kernels with 3TB/s goodness! Optimizing your kernels as much as possible is important... unless you are okay with leaving throughput on the table. check out this work from vlaw, Ted Zadouri and Tri Dao

thumb_up_off_alt18

chat_bubble_outline0

repeat6

shareShare

Ted Zadouri

@tedzadouri

6 months ago

CuTe DSL feels almost unreal: minimal Python code hits peak memory throughput on H100, as we show in QuACK. Can't wait for the addition of kernels optimized for Blackwell in QuACK 🦆

thumb_up_off_alt21

chat_bubble_outline0

repeat1

shareShare

Tri Dao

@tri_dao

6 months ago

They’ve finally done it. They got rid of tokenizers!

thumb_up_off_alt790

chat_bubble_outline9

repeat58

shareShare

Princeton Computer Science

@princetoncs

6 months ago

Congrats to Parastoo Abtahi, Tri Dao and Alex Lombardi on being named 2025 Google Research Scholars. 🎉 The @googleresearch scholars program funds world-class research conducted by early-career professors. bit.ly/4kvpvFx

Congrats to <a href="/parastooabtahi/">Parastoo Abtahi</a>, <a href="/tri_dao/">Tri Dao</a> and Alex Lombardi on being named 2025 Google Research Scholars. 🎉

The @googleresearch scholars program funds world-class research conducted by early-career professors.

bit.ly/4kvpvFx

thumb_up_off_alt76

chat_bubble_outline0

repeat5

shareShare

Tri Dao

@tri_dao

6 months ago

I played w it for 1h. Went through my usual prompts (math derivations, floating point optimizations, …). It’s a good model, feels comparable to the best frontier models

thumb_up_off_alt425

chat_bubble_outline4

repeat32

shareShare

Sanjeev Arora

@prfsanjeevarora

6 months ago

Congratulations to Parastoo Abtahi Tri Dao and Alex on this honor. Chats with people like this in the coffee room is a special pleasure at work!

thumb_up_off_alt46

chat_bubble_outline1

repeat2

shareShare

Together AI

@togethercompute

6 months ago

🚨MAJOR DROP: Kimi K2 just landed on Together AI 🚀 An open-source 1T parameter model that beats proprietary LLMs in creativity, coding, and tool use while delivering 60-70% cost savings. Built for agents. Priced for scale. 👇

thumb_up_off_alt329

chat_bubble_outline11

repeat32

shareShare

Yong Lin

@yong18850571

6 months ago

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B

thumb_up_off_alt224

chat_bubble_outline6

repeat77

shareShare