Tri Dao (@tri_dao) Twitter Tweets • TwiCopy

repeat10

If you like ML systems and San Diego beaches, you should work with Dan! Meanwhile we're lucky to have him spending some time Together AI. I've seen some early preview of what he's building, it's mind-blowing!

thumb_up_off_alt81

repeat5

Tri Dao

a month ago

You can now distill pretrained Transformers to Mamba / hybrid architecture to get really strong models with fast inference in just a few billion tokens. Beautiful math as always

Tri Dao

a month ago

Hybrid Mamba / Transformer architecture works really well, especially for long context and fast inference

AI21 Labs

@ai21labs

a month ago

📄Jamba-1.5 whitepaper is out! The whitepaper details the architecture, training schemes, novelties and in-depth evaluations of our new long context hybrid SSM-Transformer models - Jamba-1.5-Large and Jamba-1.5-Mini. Arxiv: arxiv.org/abs/2408.12570 Here are some highlights and

NVIDIA AI Developer

@nvidiaaidev

a month ago

✨ 👩‍💻 CUDA MODE is hosting the first ever IRL event in SoMa SF on Sep 21 🎊 Join Andrej Karpathy, Tri Dao, Wen-mei Hwu, and additional ML experts for keynotes, and hacking sessions. Reg is limited ➡️ events.accel.com/cudamode 🚀💻

thumb_up_off_alt81

chat_bubble_outline2

repeat8

Cartesia

@cartesia_ai

a month ago

Today, we’re unveiling a significant milestone in our journey toward ubiquitous artificial intelligence: AI On-Device. Our team pioneered a radically more efficient architecture for AI with state space models (SSMs). Now, we’ve optimized and deployed them at the edge. We believe

thumb_up_off_alt352

chat_bubble_outline11

repeat79

Zyphra

@zyphraai

a month ago

SOTA LLMs < 3B models from Zyphra - Zamba2 (1.2B) zyphra.com/post/zamba2-mi…

SOTA LLMs < 3B models from <a href="/ZyphraAI/">Zyphra</a> - Zamba2 (1.2B)

zyphra.com/post/zamba2-mi…

thumb_up_off_alt116

repeat4

anton

@abacaj

a month ago

two small model releases today zamba2-1.2B and rene-1.3B, both apache-2 and both are mamba2 based

Andrew Drozdov

a month ago

Sasha Rush

@srush_nlp

a month ago

The Mamba in the Llama: arxiv.org/abs//2408.15237 RNN are neat. Here's a video describing how to make them work really well with little money: youtube.com/watch?v=A5ff8h… (by Junxiong Wang and Daniele Paliotta )

Tri Dao

a month ago

We made distillation and spec decoding work with Mamba (and linear RNNs in general)! Up to 300 tok/sec for 7B🚀. Spec dec is nontrivial as there's no KV cache to backtrack if some tokens aren't accepted, but there's an efficient hardware-aware algo to recompute the SSM states

Tri Dao

a month ago

We have some excellent interns like James Together AI this summer doing research on efficient training & inference. Sparsity in LLM feels fundamental, here's an example of using sparsity to get faster LLM decoding

Tri Dao

22 days ago

Surprisingly, speculative decoding works well not just for small batch LLM inference but also large batch and long context. Once we understood the compute & memory profile of LLM inference, the new spec dec algorithms fall out naturally

Tri Dao

21 days ago

Albert is one of the deepest thinkers on what future AI architectures should look like. Congrats Albert Gu!

thumb_up_off_alt202

chat_bubble_outline2

repeat6

Tri Dao

21 days ago

The work on spec dec for large batch and long context Together AI has been a great collaboration with Beidi Chen and her lab. Check our their MagicDec, it's and elegant way to use e.g. StreamingLLM to reduce the KV cache of the draft model

thumb_up_off_alt27

chat_bubble_outline1

repeat3

Tri Dao

21 days ago

We have H200s, and a bunch of kernels for common layers to make training & inference go brrr 🚀

Vipul Ved Prakash

@vipulved

21 days ago

Excited to finally bring this product to market! The kernel team at Together AI kernel team led by Tri Dao has optimized a plethora of operators, some fairly fundamental ones, and made them available as kernels that can be registered in your training loop or inference

thumb_up_off_alt28

repeat3