Jeff Rasley (@jeffra45) 's Twitter Profile
Jeff Rasley

@jeffra45

@SnowflakeDB AI Research Team. @DeepSpeedAI co-founder, @BrownCSDept PhD, @uwcse alum

ID: 25111286

calendar_today18-03-2009 17:24:34

605 Tweet

760 Followers

994 Following

Yusuf Ozuysal (@yusufozuysal) 's Twitter Profile Photo

How do faster inference (up to 16x for embedding models!) and better Text2SQL through RL sound? A jam-packed launch from Snowflake AI research team detailing the technologies bundled in our ArcticInference framework and also diving deeper into how the model at the top of the

Aurick Qiao (@aurickq) 's Twitter Profile Photo

Excited to open-source Shift Parallelism, developed at Snowflake AI Research for LLM inference! With it, Arctic Inference + vLLM delivers: 🚀3.4x faster e2e latency & 1.06x higher throughput 🚀1.7x faster generation & 2.25x lower response time 🚀16x higher throughput

Excited to open-source Shift Parallelism, developed at <a href="/Snowflake/">Snowflake</a> AI Research for LLM inference!

With it, Arctic Inference + <a href="/vllm_project/">vLLM</a> delivers:

🚀3.4x faster e2e latency &amp; 1.06x higher throughput
🚀1.7x faster generation &amp; 2.25x lower response time
🚀16x higher throughput
Stas Bekman (@stasbekman) 's Twitter Profile Photo

In inference one usually gets either high throughput or low latency, but not both - enter shift parallelism which automatically adapts for the best performance!

Léo (@leik0w0) 's Twitter Profile Photo

Hello everyone! I'm releasing my first blog post ever, please go check it out if you like funky GPU builds! leikoe.github.io

Hello everyone! I'm releasing my first blog post ever, please go check it out if you like funky GPU builds!

leikoe.github.io
Jeff Rasley (@jeffra45) 's Twitter Profile Photo

Excited to share what we’ve been working on! 🚀 Stas has done an incredible job hitting major scaling milestones and making long-sequence training more accessible. Feedback and feature requests are more than welcome - take a look! :)

Rajhans Samdani (@rajhans_samdani) 's Twitter Profile Photo

The team that democratized ML model training with Deepspeed is now building Arctic training. Don't sleep on it. This is likely the best training open source training infra.

The team that democratized ML model training with Deepspeed is now building Arctic training.

Don't sleep on it. This is likely the best training open source training infra.
Snowflake (@snowflakedb) 's Twitter Profile Photo

Arctic Long Sequence Training (ALST) is here! This new open-source contribution tackles complex LLM training challenges. ALST provides modular techniques for efficiently training models on sequences up to 15 million tokens, achieving up to a 469x improvement in max trainable

Arctic Long Sequence Training (ALST) is here! This new open-source contribution tackles complex LLM training challenges.

ALST provides modular techniques for efficiently training models on sequences up to 15 million tokens, achieving up to a 469x improvement in max trainable
Stas Bekman (@stasbekman) 's Twitter Profile Photo

A deep dive in activation memory offloading: Activation checkpointing helps to save a ton of GPU memory, but those checkpoint tensors are still huge when long sequence length is used. Why not offload those to CPU memory? The attached memory profiler diagram shows the memory

A deep dive in activation memory offloading: 

Activation checkpointing helps to save a ton of GPU memory, but those checkpoint tensors are still huge when long sequence length is used. Why not offload those to CPU memory? The attached memory profiler diagram shows the memory
Canwen Xu (@xucanwen) 's Twitter Profile Photo

❄️We're looking for a MLE/Applied Scientist to join our Snowflake AI team to work on AI+Software Engineering. If you have sharp eyes to find potential pain points for developers and can solve with AI, this job is just for you! 👉Apply here: careers.snowflake.com/us/en/job/SNCO…

Axolotl (@axolotl_ai) 's Twitter Profile Photo

Axolotl v0.11.0 is out! We've included ALST's TiledMLP for increased long sequence length training as well as support for Devstral, DenseMixer (MoE performance), and support for the most recent releases of transformers 4.53.1, accelerate 1.8.1, and FlashAttention2.

Wing Lian (caseus) (@winglian) 's Twitter Profile Photo

Huge thanks to the patience and mentorship from Stas Bekman on helping to get ALST/TiledMLP working in Axolotl ! So far we've been able to get 400k full parameter fine-tuning working on a single H100 (system RAM constrained) and we'll have updated numbers soon for Axolotl on