Ammar Ahmad Awan (@ammar_awan) 's Twitter Profile
Ammar Ahmad Awan

@ammar_awan

DeepSpeed-er @Microsoft, @MSFTDeepSpeed, Father, PhD, Wanna-be Professor, Technology Enthusiast.

ID: 372102004

linkhttp://awan-10.github.io calendar_today12-09-2011 04:03:46

475 Tweet

259 Followers

532 Following

Stas Bekman (@stasbekman) 's Twitter Profile Photo

If you were holding off to try @MSFTDeepSpeed ZeRO++ it looks like deepspeed@master should work well now: github.com/microsoft/Deep… ZeRO++'s main feature is allowing you to use a hybrid approach if you can fit a model on a single node of 8 gpus. So it takes benefit of the super

Quentin Anthony (@quentinanthon15) 's Twitter Profile Photo

Getting the most out of your hardware when training transformers requires thinking about your model as a sequence of GPU kernel calls. This mindset, common in HPC, is rare in ML and leads to inefficiencies in LLM training. Learn more in our paper arxiv.org/abs/2401.14489

Getting the most out of your hardware when training transformers requires thinking about your model as a sequence of GPU kernel calls. This mindset, common in HPC, is rare in ML and leads to inefficiencies in LLM training. 

Learn more in our paper arxiv.org/abs/2401.14489
DeepSpeed (@deepspeedai) 's Twitter Profile Photo

Are you a #DeepSpeed user, fan, contributor, and/or advocate? Are you interested in meeting people behind @MSFTDeepSpeed tech? Are you interested in #AI? If yes, come and meet the team at our first in-person meetup in the Seattle area! Register here: developer.microsoft.com/reactor/events…

Stas Bekman (@stasbekman) 's Twitter Profile Photo

The other news is the introduction of @MSFTDeepSpeed Meetups, which will be conducted once about every 3 months. The inaugural one will be on Feb 12 6:00 PM - 8:00 PM at Redmond Reactor developer.microsoft.com/en-us/reactor/… Quote: "This will be the first ever meetup for the DeepSpeed

DeepSpeed (@deepspeedai) 's Twitter Profile Photo

Thanks Stas Bekman! DeepSpeed team is hiring for various engineering and research roles! Come join us and steer the future of large scale AI training and inference.

DeepSpeed (@deepspeedai) 's Twitter Profile Photo

#DeepSpeed joins forces with University of Sydney to unveil an exciting tech #FP6. Just supply your FP16 models, and we deliver: 🚀 1.5x performance boost for #LLMs serving on #GPUs 🚀 Innovative (4+2)-bit system design 🚀 Quality-preserving quantization link: github.com/microsoft/Deep…

#DeepSpeed joins forces with <a href="/Sydney_Uni/">University of Sydney</a>  to unveil an exciting tech #FP6.  Just supply your FP16 models, and we deliver:
🚀 1.5x performance boost for #LLMs serving on #GPUs
🚀 Innovative (4+2)-bit system design
🚀 Quality-preserving quantization 
link: github.com/microsoft/Deep…
Yann LeCun (@ylecun) 's Twitter Profile Photo

* Language is low bandwidth: less than 12 bytes/second. A person can read 270 words/minutes, or 4.5 words/second, which is 12 bytes/s (assuming 2 bytes per token and 0.75 words per token). A modern LLM is typically trained with 1x10^13 two-byte tokens, which is 2x10^13 bytes.

Stas Bekman (@stasbekman) 's Twitter Profile Photo

If you're trying to run MoE Mixtral-8x7b under @MSFTDeepSpeed it's likely to hang on the first forward The solution is here github.com/microsoft/Deep… and you need deepspeed>=0.13.0 Thanks to Masahiro Tanaka for the fix. edit: looks like someone codified it even better:

Sebastien Bubeck (@sebastienbubeck) 's Twitter Profile Photo

phi-3 is here, and it's ... good :-). I made a quick short demo to give you a feel of what phi-3-mini (3.8B) can do. Stay tuned for the open weights release and more announcements tomorrow morning! (And ofc this wouldn't be complete without the usual table of benchmarks!)

Dalia Mogahed (@dmogahed) 's Twitter Profile Photo

Brought me to tears. She’s so respected by her peers. What an achievement. Academic excellence and ethical leadership. Asna Tabassum, we salute you.

DeepSpeed (@deepspeedai) 's Twitter Profile Photo

Announcing that DeepSpeed now runs natively on Windows. This exciting combination unlocks DeepSpeed optimizations to Windows users and empowers more people and organizations with AI innovations. - HF Inference & Finetuning - LoRA - CPU Offload Blog: shorturl.at/a7TF8

Announcing that DeepSpeed now runs natively on Windows. This exciting combination unlocks  DeepSpeed optimizations to Windows users and empowers more people and organizations with AI innovations. 
- HF Inference &amp; Finetuning
- LoRA
- CPU Offload

Blog: shorturl.at/a7TF8
MVAPICH (@mvapich) 's Twitter Profile Photo

Dr. Ammar Ahmad Awan from Microsoft DeepSpeed giving a presentation at MUG '24 over Trillion-parameter LLMs and optimization with MVAPICH. OSUengineering Microsoft OH-TECH MVAPICH @MSFTDeepSpeed @MSFTDeepSpeedJP #MUG24 #MPI #AI #LLM #DeepSpeed

Dr. Ammar Ahmad Awan from Microsoft DeepSpeed giving a presentation at MUG '24 over Trillion-parameter LLMs and optimization with MVAPICH.

<a href="/OSUengineering/">OSUengineering</a> <a href="/Microsoft/">Microsoft</a>
<a href="/OhTechCo/">OH-TECH</a> <a href="/mvapich/">MVAPICH</a>
@MSFTDeepSpeed
@MSFTDeepSpeedJP #MUG24 #MPI #AI #LLM #DeepSpeed
DeepSpeed (日本語アカウント) (@deepspeedai_jp) 's Twitter Profile Photo

オハイオ州立大学で開かれたイベントで、メンバーのAmmar Ahmad Awan Ammar Ahmad Awan が、DeepSpeedの最適化に関する講演を行いました! オハイオ州立大学は分散並列処理の研究で広く知られており、DeepSpeedチームにも出身者が多くいます。

Byron Hsu (@hsu_byron) 's Twitter Profile Photo

(1/n) Training LLMs can be hindered by out-of-memory, scaling batch size, and seq length. Add one line to boost multi-GPU training throughput by 20% and reduce memory usage by 60%. Introducing Liger-Kernel: Efficient Triton Kernels for LLM Training. github.com/linkedin/Liger…

(1/n)

Training LLMs can be hindered by out-of-memory, scaling batch size, and seq length. Add one line to boost multi-GPU training throughput by 20% and reduce memory usage by 60%. Introducing Liger-Kernel: Efficient Triton Kernels for LLM Training.

github.com/linkedin/Liger…
Jeff Rasley (@jeffra45) 's Twitter Profile Photo

🧵1/ New release from Snowflake AI Research: Shift Parallelism is a new LLM inference technique built on top of vLLM, released through ArcticInference. It dramatically improves latency while preserving high throughput. Here’s what it looks like in action 👇