Ammar Ahmad Awan (@ammar_awan) Twitter Tweets • TwiCopy

Stas Bekman

2 years ago

If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should work well now: github.com/microsoft/Deep… ZeRO++'s main feature is allowing you to use a hybrid approach if you can fit a model on a single node of 8 gpus. So it takes benefit of the super

thumb_up_off_alt79

chat_bubble_outline3

repeat12

shareShare

Quentin Anthony

@quentinanthon15

2 years ago

Getting the most out of your hardware when training transformers requires thinking about your model as a sequence of GPU kernel calls. This mindset, common in HPC, is rare in ML and leads to inefficiencies in LLM training. Learn more in our paper arxiv.org/abs/2401.14489

thumb_up_off_alt336

chat_bubble_outline7

repeat76

shareShare

DeepSpeed

@deepspeedai

2 years ago

Are you a #DeepSpeed user, fan, contributor, and/or advocate? Are you interested in meeting people behind @MSFTDeepSpeed tech? Are you interested in #AI? If yes, come and meet the team at our first in-person meetup in the Seattle area! Register here: developer.microsoft.com/reactor/events…

thumb_up_off_alt21

chat_bubble_outline2

repeat9

shareShare

Stas Bekman

@stasbekman

2 years ago

The other news is the introduction of @MSFTDeepSpeed Meetups, which will be conducted once about every 3 months. The inaugural one will be on Feb 12 6:00 PM - 8:00 PM at Redmond Reactor developer.microsoft.com/en-us/reactor/… Quote: "This will be the first ever meetup for the DeepSpeed

thumb_up_off_alt24

chat_bubble_outline0

repeat2

shareShare

DeepSpeed

@deepspeedai

2 years ago

Thanks Stas Bekman! DeepSpeed team is hiring for various engineering and research roles! Come join us and steer the future of large scale AI training and inference.

thumb_up_off_alt22

chat_bubble_outline0

repeat4

shareShare

DeepSpeed

@deepspeedai

2 years ago

#DeepSpeed joins forces with University of Sydney to unveil an exciting tech #FP6. Just supply your FP16 models, and we deliver: 🚀 1.5x performance boost for #LLMs serving on #GPUs 🚀 Innovative (4+2)-bit system design 🚀 Quality-preserving quantization link: github.com/microsoft/Deep…

#DeepSpeed joins forces with <a href="/Sydney_Uni/">University of Sydney</a> to unveil an exciting tech #FP6. Just supply your FP16 models, and we deliver:
🚀 1.5x performance boost for #LLMs serving on #GPUs
🚀 Innovative (4+2)-bit system design
🚀 Quality-preserving quantization
link: github.com/microsoft/Deep…

thumb_up_off_alt168

chat_bubble_outline1

repeat26

shareShare

Yann LeCun

@ylecun

2 years ago

* Language is low bandwidth: less than 12 bytes/second. A person can read 270 words/minutes, or 4.5 words/second, which is 12 bytes/s (assuming 2 bytes per token and 0.75 words per token). A modern LLM is typically trained with 1x10^13 two-byte tokens, which is 2x10^13 bytes.

thumb_up_off_alt8,8K

chat_bubble_outline562

repeat1,1K

shareShare

Stas Bekman

@stasbekman

2 years ago

If you're trying to run MoE Mixtral-8x7b under @MSFTDeepSpeed it's likely to hang on the first forward The solution is here github.com/microsoft/Deep… and you need deepspeed>=0.13.0 Thanks to Masahiro Tanaka for the fix. edit: looks like someone codified it even better:

thumb_up_off_alt79

chat_bubble_outline2

repeat9

shareShare

Sebastien Bubeck

@sebastienbubeck

2 years ago

phi-3 is here, and it's ... good :-). I made a quick short demo to give you a feel of what phi-3-mini (3.8B) can do. Stay tuned for the open weights release and more announcements tomorrow morning! (And ofc this wouldn't be complete without the usual table of benchmarks!)

thumb_up_off_alt933

chat_bubble_outline41

repeat181

shareShare

Sebastien Bubeck

@sebastienbubeck

2 years ago

The phi-3 family is the work of an amazing team over many months, kudos to everyone!

thumb_up_off_alt112

chat_bubble_outline4

repeat4

shareShare

Dalia Mogahed

@dmogahed

a year ago

Brought me to tears. She’s so respected by her peers. What an achievement. Academic excellence and ethical leadership. Asna Tabassum, we salute you.

thumb_up_off_alt135

chat_bubble_outline2

repeat36

shareShare

DeepSpeed

@deepspeedai

a year ago

Announcing that DeepSpeed now runs natively on Windows. This exciting combination unlocks DeepSpeed optimizations to Windows users and empowers more people and organizations with AI innovations. - HF Inference & Finetuning - LoRA - CPU Offload Blog: shorturl.at/a7TF8

thumb_up_off_alt38

chat_bubble_outline1

repeat6

shareShare

MVAPICH

@mvapich

a year ago

Dr. Ammar Ahmad Awan from Microsoft DeepSpeed giving a presentation at MUG '24 over Trillion-parameter LLMs and optimization with MVAPICH. OSUengineering Microsoft OH-TECH MVAPICH @MSFTDeepSpeed @MSFTDeepSpeedJP #MUG24 #MPI #AI #LLM #DeepSpeed

Dr. Ammar Ahmad Awan from Microsoft DeepSpeed giving a presentation at MUG '24 over Trillion-parameter LLMs and optimization with MVAPICH.

<a href="/OSUengineering/">OSUengineering</a> <a href="/Microsoft/">Microsoft</a>
<a href="/OhTechCo/">OH-TECH</a> <a href="/mvapich/">MVAPICH</a>
@MSFTDeepSpeed
@MSFTDeepSpeedJP #MUG24 #MPI #AI #LLM #DeepSpeed

thumb_up_off_alt8

chat_bubble_outline1

repeat5

shareShare

DeepSpeed (日本語アカウント)

@deepspeedai_jp

a year ago

オハイオ州立大学で開かれたイベントで、メンバーのAmmar Ahmad Awan Ammar Ahmad Awan が、DeepSpeedの最適化に関する講演を行いました！オハイオ州立大学は分散並列処理の研究で広く知られており、DeepSpeedチームにも出身者が多くいます。

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Ammar Ahmad Awan

@ammar_awan

a year ago

Felt great to be back at OSU. Thank you Dhabaleswar Panda, Hari Subramoni for inviting me and enabling me to share the awesome DeepSpeed work with MVAPICH team!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Byron Hsu

@hsu_byron

a year ago

(1/n) Training LLMs can be hindered by out-of-memory, scaling batch size, and seq length. Add one line to boost multi-GPU training throughput by 20% and reduce memory usage by 60%. Introducing Liger-Kernel: Efficient Triton Kernels for LLM Training. github.com/linkedin/Liger…

thumb_up_off_alt973

chat_bubble_outline21

repeat171

shareShare

Min Choi

@minchoi

6 months ago

I asked ChatGPT o3 the top 10 most weirdest prompts people ask

thumb_up_off_alt258

chat_bubble_outline34

repeat31

shareShare

Jeff Rasley

@jeffra45

5 months ago

🧵1/ New release from Snowflake AI Research: Shift Parallelism is a new LLM inference technique built on top of vLLM, released through ArcticInference. It dramatically improves latency while preserving high throughput. Here’s what it looks like in action 👇

thumb_up_off_alt72

chat_bubble_outline1

repeat18

shareShare