Stas Bekman (@stasbekman) Twitter Tweets • TwiCopy

Stas Bekman

@stasbekman

+ Follow

Toolmaker. Software creator, optimizer and harmonizer.

Makes ML systems work and fly @ Snowflake.

ID: 1068360975898660864

linkhttps://stasosphere.com/machine-learning/ calendar_today30-11-2018 04:28:00

2,2K Tweet

8,8K Followers

282 Following

Stas Bekman

@stasbekman

5 months ago

Some time back I asked the DeepSpeed team to add torch.compile support - they did and went beyond that and created DeepCompile which can now massively speed up your training workloads - check out those plots! Wow! From talking to developers this is just the beginning and more

thumb_up_off_alt37

chat_bubble_outline0

repeat0

shareShare

Stas Bekman

@stasbekman

5 months ago

Very long context models are coming, e.g. NVIDIA's UltraLong 4M token series: huggingface.co/nvidia/Llama-3… But how do you finetune for such long seqlen? Soon we will post a working code that can finetune with millions of token seqlen for HF Transformers.

thumb_up_off_alt103

chat_bubble_outline6

repeat17

shareShare

Stas Bekman

@stasbekman

4 months ago

Modern art. Artist: PyTorch memory profiler Model: Llama-8B The piece on the left is the Forward pass The piece on the right is Backward pass

thumb_up_off_alt281

chat_bubble_outline3

repeat25

shareShare

Stas Bekman

@stasbekman

4 months ago

Have you figured out how to estimate FLOPs for Flash Attention 2 w/ packed samples? The formula it gives in the paper leads to about 2-3x what it should be Sections 4.1 and 4.2 of the FA2 paper can't decide what the right formula should be :( it suggests 14x in 4.1 and 6x or

thumb_up_off_alt88

chat_bubble_outline3

repeat9

shareShare

Stas Bekman

@stasbekman

4 months ago

My colleagues' amazing speed up of Speculative Decoding in vllm - check it out!

thumb_up_off_alt28

chat_bubble_outline0

repeat0

shareShare

Stas Bekman

@stasbekman

4 months ago

Great news for DeepSpeed and VLLM!

thumb_up_off_alt73

chat_bubble_outline0

repeat8

shareShare

Stas Bekman

@stasbekman

4 months ago

I have just realized I have never mentioned github.com/stas00/make-to… When I need to release a new package I just type: make release and have it bump up the version, update CHANGES.md, tag the release, start a new dev branch, commit all that, build the pip/conda

thumb_up_off_alt5

chat_bubble_outline1

repeat0

shareShare

Dwarak Rajagopal

@dwarak

3 months ago

Snowflake AI Research team is on fire! 🔥 Thrilled for our breakthroughs across embeddings, inference, & SQL generation - pioneering practical research that directly tackles critical real-world challenges for enterprise users! #AI #SnowflakeAI

thumb_up_off_alt12

chat_bubble_outline3

repeat3

shareShare

Stas Bekman

@stasbekman

3 months ago

fp16 is no more for training

thumb_up_off_alt129

chat_bubble_outline3

repeat11

shareShare

Stas Bekman

@stasbekman

3 months ago

In inference one usually gets either high throughput or low latency, but not both - enter shift parallelism which automatically adapts for the best performance!

thumb_up_off_alt26

chat_bubble_outline0

repeat6

shareShare