Tri Dao (@tri_dao) 's Twitter Profile
Tri Dao

@tri_dao

Asst. Prof @PrincetonCS, Chief Scientist @togethercompute. Machine learning & systems.

ID: 568879807

linkhttps://tridao.me calendar_today02-05-2012 07:13:50

690 Tweet

23,23K Followers

442 Following

Tri Dao (@tri_dao) 's Twitter Profile Photo

I love Cutlass. The same core abstractions have achieved top matmul speed spanning multiple GPU generations. Really speak to how well it's designed

Tri Dao (@tri_dao) 's Twitter Profile Photo

If you like ML systems and San Diego beaches, you should work with Dan! Meanwhile we're lucky to have him spending some time Together AI. I've seen some early preview of what he's building, it's mind-blowing!

Tri Dao (@tri_dao) 's Twitter Profile Photo

You can now distill pretrained Transformers to Mamba / hybrid architecture to get really strong models with fast inference in just a few billion tokens. Beautiful math as always

AI21 Labs (@ai21labs) 's Twitter Profile Photo

📄Jamba-1.5 whitepaper is out! The whitepaper details the architecture, training schemes, novelties and in-depth evaluations of our new long context hybrid SSM-Transformer models - Jamba-1.5-Large and Jamba-1.5-Mini. Arxiv: arxiv.org/abs/2408.12570 Here are some highlights and

📄Jamba-1.5 whitepaper is out!
The whitepaper details the architecture, training schemes, novelties and in-depth evaluations of our new long context hybrid SSM-Transformer models - Jamba-1.5-Large and Jamba-1.5-Mini.

Arxiv: arxiv.org/abs/2408.12570

Here are some highlights and
NVIDIA AI Developer (@nvidiaaidev) 's Twitter Profile Photo

✨ 👩‍💻 CUDA MODE is hosting the first ever IRL event in SoMa SF on Sep 21 🎊 Join Andrej Karpathy, Tri Dao, Wen-mei Hwu, and additional ML experts for keynotes, and hacking sessions. Reg is limited ➡️ events.accel.com/cudamode 🚀💻

Cartesia (@cartesia_ai) 's Twitter Profile Photo

Today, we’re unveiling a significant milestone in our journey toward ubiquitous artificial intelligence: AI On-Device. Our team pioneered a radically more efficient architecture for AI with state space models (SSMs). Now, we’ve optimized and deployed them at the edge. We believe

Today, we’re unveiling a significant milestone in our journey toward ubiquitous artificial intelligence: AI On-Device.

Our team pioneered a radically more efficient architecture for AI with state space models (SSMs). Now, we’ve optimized and deployed them at the edge. We believe
Sasha Rush (@srush_nlp) 's Twitter Profile Photo

The Mamba in the Llama: arxiv.org/abs//2408.15237 RNN are neat. Here's a video describing how to make them work really well with little money: youtube.com/watch?v=A5ff8h… (by Junxiong Wang and Daniele Paliotta )

Tri Dao (@tri_dao) 's Twitter Profile Photo

We made distillation and spec decoding work with Mamba (and linear RNNs in general)! Up to 300 tok/sec for 7B🚀. Spec dec is nontrivial as there's no KV cache to backtrack if some tokens aren't accepted, but there's an efficient hardware-aware algo to recompute the SSM states

Tri Dao (@tri_dao) 's Twitter Profile Photo

We have some excellent interns like James Together AI this summer doing research on efficient training & inference. Sparsity in LLM feels fundamental, here's an example of using sparsity to get faster LLM decoding

Tri Dao (@tri_dao) 's Twitter Profile Photo

Surprisingly, speculative decoding works well not just for small batch LLM inference but also large batch and long context. Once we understood the compute & memory profile of LLM inference, the new spec dec algorithms fall out naturally

Tri Dao (@tri_dao) 's Twitter Profile Photo

The work on spec dec for large batch and long context Together AI has been a great collaboration with Beidi Chen and her lab. Check our their MagicDec, it's and elegant way to use e.g. StreamingLLM to reduce the KV cache of the draft model

Vipul Ved Prakash (@vipulved) 's Twitter Profile Photo

Excited to finally bring this product to market! The kernel team at Together AI kernel team led by Tri Dao has optimized a plethora of operators, some fairly fundamental ones, and made them available as kernels that can be registered in your training loop or inference