Jeff Rasley (@jeffra45) Twitter Tweets • TwiCopy

Yusuf Ozuysal

6 months ago

How do faster inference (up to 16x for embedding models!) and better Text2SQL through RL sound? A jam-packed launch from Snowflake AI research team detailing the technologies bundled in our ArcticInference framework and also diving deeper into how the model at the top of the

thumb_up_off_alt5

chat_bubble_outline0

repeat3

shareShare

Aurick Qiao

@aurickq

6 months ago

Excited to open-source Shift Parallelism, developed at Snowflake AI Research for LLM inference! With it, Arctic Inference + vLLM delivers: 🚀3.4x faster e2e latency & 1.06x higher throughput 🚀1.7x faster generation & 2.25x lower response time 🚀16x higher throughput

Excited to open-source Shift Parallelism, developed at <a href="/Snowflake/">Snowflake</a> AI Research for LLM inference!

With it, Arctic Inference + <a href="/vllm_project/">vLLM</a> delivers:

🚀3.4x faster e2e latency & 1.06x higher throughput
🚀1.7x faster generation & 2.25x lower response time
🚀16x higher throughput

thumb_up_off_alt165

chat_bubble_outline2

repeat39

shareShare

Stas Bekman

@stasbekman

6 months ago

In inference one usually gets either high throughput or low latency, but not both - enter shift parallelism which automatically adapts for the best performance!

thumb_up_off_alt26

chat_bubble_outline0

repeat6

shareShare

Léo

@leik0w0

6 months ago

Hello everyone! I'm releasing my first blog post ever, please go check it out if you like funky GPU builds! leikoe.github.io

thumb_up_off_alt469

chat_bubble_outline12

repeat31

shareShare

Jeff Rasley

@jeffra45

5 months ago

Excited to share what we’ve been working on! 🚀 Stas has done an incredible job hitting major scaling milestones and making long-sequence training more accessible. Feedback and feature requests are more than welcome - take a look! :)

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Rajhans Samdani

@rajhans_samdani

5 months ago

The team that democratized ML model training with Deepspeed is now building Arctic training. Don't sleep on it. This is likely the best training open source training infra.

thumb_up_off_alt4

chat_bubble_outline2

repeat1

shareShare

Snowflake

@snowflakedb

5 months ago

Arctic Long Sequence Training (ALST) is here! This new open-source contribution tackles complex LLM training challenges. ALST provides modular techniques for efficiently training models on sequences up to 15 million tokens, achieving up to a 469x improvement in max trainable

thumb_up_off_alt15

chat_bubble_outline0

repeat6

shareShare

Stas Bekman

@stasbekman

5 months ago

A deep dive in activation memory offloading: Activation checkpointing helps to save a ton of GPU memory, but those checkpoint tensors are still huge when long sequence length is used. Why not offload those to CPU memory? The attached memory profiler diagram shows the memory

thumb_up_off_alt222

chat_bubble_outline5

repeat30

shareShare

Aurick Qiao

@aurickq

5 months ago

Arctic Inference helps All Hands AI complete real-world coding tasks 2x faster through faster LLM inference. Check it out!

thumb_up_off_alt23

chat_bubble_outline0

repeat8

shareShare

Canwen Xu

@xucanwen

4 months ago

❄️We're looking for a MLE/Applied Scientist to join our Snowflake AI team to work on AI+Software Engineering. If you have sharp eyes to find potential pain points for developers and can solve with AI, this job is just for you! 👉Apply here: careers.snowflake.com/us/en/job/SNCO…

thumb_up_off_alt18

chat_bubble_outline0

repeat4

shareShare

Axolotl

@axolotl_ai

4 months ago

Axolotl v0.11.0 is out! We've included ALST's TiledMLP for increased long sequence length training as well as support for Devstral, DenseMixer (MoE performance), and support for the most recent releases of transformers 4.53.1, accelerate 1.8.1, and FlashAttention2.

thumb_up_off_alt13

chat_bubble_outline3

repeat5

shareShare

Wing Lian (caseus)

@winglian

4 months ago

Huge thanks to the patience and mentorship from Stas Bekman on helping to get ALST/TiledMLP working in Axolotl ! So far we've been able to get 400k full parameter fine-tuning working on a single H100 (system RAM constrained) and we'll have updated numbers soon for Axolotl on

thumb_up_off_alt21

chat_bubble_outline2

repeat5

shareShare

Stas Bekman

@stasbekman

4 months ago

Good news - Arctic Long Sequence Training (ALST) is being integrated into Axolotl - thanks a lot, Wing Lian (caseus)! It has been wonderful working with you.

thumb_up_off_alt16

chat_bubble_outline0

repeat1

shareShare