Dwarak Rajagopal (@dwarak) Twitter Tweets • TwiCopy

Dwarak Rajagopal

8 months ago

Arctic Ulysses from SnowflakeDB cuts TTFT by 6.8x for long-context LLMs with sequence parallelism. A game-changer for inference! 🚀 Read more: snowflake.com/en/engineering… #AI #LLM #Inference Kudos to Mert, Aurick Qiao, Jeff Rasley, Yuxiong He and Samyam Rajbhandari

thumb_up_off_alt12

chat_bubble_outline1

repeat1

shareShare

Dwarak Rajagopal

@dwarak

8 months ago

The future of enterprise AI is data-native. With Meta’s Llama 4 models now in Cortex AI, developers can build intelligent, multimodal applications and agents directly on their data—securely, efficiently, and at scale. This is just the beginning. x.com/SnowflakeDB/st…

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Dwarak Rajagopal

@dwarak

7 months ago

Blazing fast inference! 🚀 Aurick Qiao shares how Arctic Inference + vLLM achieves the fastest LLM inference yet—up to 4x speedups. Best part? It's all open-sourced for the community! 💻 #AI #OpenSource #vLLM

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Casper Hansen

@casper_hansen_

7 months ago

Almost a 5x speedup in vLLM🤯 I was able to push a finetuned Mistral Nemo from 110 tokens/s to a peak of 517 tokens/s and acceptance rate of 57.7%. This is with Suffix Decoding from ArcticInference⚡

thumb_up_off_alt253

chat_bubble_outline7

repeat29

shareShare

Dwarak Rajagopal

@dwarak

7 months ago

Exciting news! The PyTorch Foundation’s expansion with vLLM and DeepSpeed is a game-changer for open-source AI. Can’t wait to see the innovations this brings! As a premier member, Snowflake is excited to join the Board and help grow the OSS community. Big things ahead! 🚀

thumb_up_off_alt12

chat_bubble_outline0

repeat7

shareShare

Dwarak Rajagopal

@dwarak

7 months ago

🌟 Thrilled to announce Snowflake AI Research’s latest breakthroughs in Text-to-SQL! 🚀 ✅ #1 on the BIRD leaderboard, surpassing SOTA by 2.8% with Arctic-Text2SQL-R1-32B! ✅ #1 on Spider 2.0, mastering real-world challenges with groundbreaking innovation! ❄️ Our team at

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Dwarak Rajagopal

@dwarak

7 months ago

Huge props to Hao Zhang and Hao AI Lab for FastVideo V1! This makes state-of-the-art video generation accessible and fast, revolutionizing how we approach distributed computing in AI. A must-try for every developer!

thumb_up_off_alt8

chat_bubble_outline1

repeat0

shareShare

Dwarak Rajagopal

@dwarak

7 months ago

🤝 A new era for AI development! Percy Liang's Marin lab is redefining open-source AI with open development—fully transparent and collaborative. Join the movement! #AIInnovation #OpenScience

🤝 A new era for AI development! <a href="/percyliang/">Percy Liang</a>'s Marin lab is redefining open-source AI with open development—fully transparent and collaborative. Join the movement! #AIInnovation #OpenScience

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Dwarak Rajagopal

@dwarak

6 months ago

Shift Parallelism from Snowflake AI Research is a game-changer! 🚀 3.4x faster LLM inference with Arctic + vLLM . Loving the throughput boost!

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

Percy Liang

@percyliang

6 months ago

Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team Tatsunori Hashimoto Marcel Rød Neil Band Rohith Kuditipudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:

thumb_up_off_alt3,3K

chat_bubble_outline31

repeat323

shareShare

Stas Bekman

@stasbekman

5 months ago

As I'm diving into Sequence/Context parallelism in the last few days I wanted to share this write up in 2 parts that nicely compares the few approaches out there and some of their combinations with papers: p1: insujang.github.io/2024-01-11/ten… p2: insujang.github.io/2024-09-20/int…

thumb_up_off_alt214

chat_bubble_outline5

repeat25

shareShare

Aurick Qiao

@aurickq

5 months ago

Arctic Inference helps All Hands AI complete real-world coding tasks 2x faster through faster LLM inference. Check it out!

thumb_up_off_alt23

chat_bubble_outline0

repeat8

shareShare

Stas Bekman

@stasbekman

5 months ago

Today is a deep dive into sequence tiling compute. Sequence tiling massively reduces activation memory footprint and can be applied to computations w/o token inter-dependency. The plot shows a huge memory saving with tiled fused logits loss computation. See section 3.1 in our

thumb_up_off_alt65

chat_bubble_outline1

repeat10

shareShare

Sriram Krishnan

@sriramk

5 months ago

🇺🇸 Today is a day we have been working towards for six months. We are announcing America’s AI action plan putting us on the road to continued AI dominance. The three core themes: - Accelerate AI innovation - Build American AI infrastructure - Lead in international AI

thumb_up_off_alt4,4K

chat_bubble_outline220

repeat674

shareShare

Snowflake

@snowflakedb

4 months ago

We are thrilled to announce that OpenAI’s most advanced model, GPT-5, is now available natively on Snowflake Cortex AI for customers to use. This integration unlocks a wide range of enterprise use cases within Snowflake’s secure, governed environment: ❄️ Transform data into

thumb_up_off_alt25

chat_bubble_outline0

repeat3

shareShare