Animesh Singh (@animeshsingh) 's Twitter Profile
Animesh Singh

@animeshsingh

#ArtificialIntelligence #DeepLearning #MachineLearning #MLOps #AI #ML #DL #Kubernetes #Cloud

ID: 28748501

calendar_today04-04-2009 05:39:48

5,5K Tweet

2,2K Followers

2,2K Following

Yam Peleg (@yampeleg) 's Twitter Profile Photo

You thought that you can go to sleep now?? Orca 2 Just dropped. Paper: arxiv.org/pdf/2311.11045… Results: Orca 2 13B beats LLaMA-Chat-70B TL;DR: Training smaller model to reason by using multiple techniques: step-by-step, recall then generate, recall-reason-generate, direct

You thought that you can go to sleep now??

Orca 2 Just dropped.
Paper: arxiv.org/pdf/2311.11045…

Results:
Orca 2 13B beats LLaMA-Chat-70B

TL;DR:
Training smaller model to reason by using multiple techniques:

step-by-step, recall then generate, recall-reason-generate, direct
Alex Volkov (Thursd/AI) (@altryne) 's Twitter Profile Photo

Lol what, Linkedin casually dropped a kernel repo that reduces LLM training time on multi GPU by 20% while reducing memory by 60%? 😂 I'll start posting more on linkedin from now on h/t Wing Lian (caseus) (who of course already merged this into Axolotl 👏) github.com/linkedin/Liger…

Yam Peleg (@yampeleg) 's Twitter Profile Photo

What is the most random event you can make up right now? Exactly! LinkedIN just dropped the highest performance gpu kernels (triton) for training LLMs. ( WTF?? 😅 ) Throughout up by up to 20% Mem reduced by up to 60% Out of the box support HF models. github.com/linkedin/Liger…

What is the most random event you can make up right now?

Exactly!

LinkedIN just dropped the highest performance gpu kernels (triton) for training LLMs.

( WTF?? 😅 )

Throughout up by up to 20%
Mem reduced by up to 60%

Out of the box support HF models.

github.com/linkedin/Liger…
Byron Hsu (@hsu_byron) 's Twitter Profile Photo

(1/n) Training LLMs can be hindered by out-of-memory, scaling batch size, and seq length. Add one line to boost multi-GPU training throughput by 20% and reduce memory usage by 60%. Introducing Liger-Kernel: Efficient Triton Kernels for LLM Training. github.com/linkedin/Liger…

(1/n)

Training LLMs can be hindered by out-of-memory, scaling batch size, and seq length. Add one line to boost multi-GPU training throughput by 20% and reduce memory usage by 60%. Introducing Liger-Kernel: Efficient Triton Kernels for LLM Training.

github.com/linkedin/Liger…
Byron Hsu (@hsu_byron) 's Twitter Profile Photo

If you've read this far, be sure to star our repo at github.com/linkedin/Liger…! We would like to thank Animesh Singh, Haowen Ning, Yanning Chen for the leadership support, shao, Qingquan Song, Yun Dai Vignesh Kothapalli Shivam Sahni Zain Merchant for the

kalomaze (@kalomaze) 's Twitter Profile Photo

from intervitens > previously I was maxing out vram with cpu param offload, and now even without offloading, I only get 75% vram usage It actually... just works™ (4b FFT on 4x3090s, bs1 @ 8192 context)

Tianle Cai @ ICLR 2025🇸🇬 (@tianle_cai) 's Twitter Profile Photo

We've heard many complaints about the high GPU memory requirements for training Medusa heads on models with large vocabularies. This is no longer an issue, thanks to the amazing Liger kernel developed by the LinkedIn team (Byron Hsu and team) The Liger kernel cleverly fuses the

We've heard many complaints about the high GPU memory requirements for training Medusa heads on models with large vocabularies. This is no longer an issue, thanks to the amazing Liger kernel developed by the LinkedIn team (<a href="/hsu_byron/">Byron Hsu</a> and team)

The Liger kernel cleverly fuses the
LLaMA Factory (@llamafactory_ai) 's Twitter Profile Photo

We've integrated the Liger Kernel into LLaMA-Factory. It achieves ~10% speed up and ~25% memory reduction when fine-tuning Llama-3 8B on 2k sequences. Try it out at LLaMA-Factory🚀

We've integrated the Liger Kernel into LLaMA-Factory.

It achieves ~10% speed up and ~25% memory reduction when fine-tuning Llama-3 8B on 2k sequences. Try it out at LLaMA-Factory🚀
Byron Hsu (@hsu_byron) 's Twitter Profile Photo

Liger Kernel is officially supported in SFTTrainer, the most popular trainer for LLM fine tuning. Add `--use_liger_kernel` to supercharge the training with one flag. Hugging Face Thomas Wolf Lewis Tunstall

Liger Kernel is officially supported in SFTTrainer, the most popular trainer for LLM fine tuning. Add `--use_liger_kernel` to supercharge the training with one flag.

<a href="/huggingface/">Hugging Face</a> <a href="/Thom_Wolf/">Thomas Wolf</a> <a href="/_lewtun/">Lewis Tunstall</a>
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Linkedin's great Liger Kernel repo just released the updated version. - New Integrations: SFTtrainer, Axolotl, LLaMa-Factory - New Models Support: Phi3 & Qwen2 - AutoModel API: Meet AutoLigerKernelForCausalLM - Enhanced FusedLinearCrossEntropy: support bias term

Linkedin's great Liger Kernel repo just released the updated version.

- New Integrations: SFTtrainer, Axolotl, LLaMa-Factory
- New Models Support: Phi3 &amp; Qwen2
- AutoModel API: Meet AutoLigerKernelForCausalLM
- Enhanced FusedLinearCrossEntropy: support bias term
Anush Elangovan (@anushelangovan) 's Twitter Profile Photo

Pytorch zoom backend: An experimental Triton first integration into PyTorch eager mode where the kernels are written in Triton (instead of CUDA or HIP). Uses Liger kernels now but can use any Triton kernel. Runs llamas. hack away at it. Thoughts ? github.com/nod-ai/pytorch…

MLOps Community (@mlopscommunity) 's Twitter Profile Photo

We had the pleasure of chatting with Animesh Singh in our latest podcast episode! Animesh is the Director of GBU Infrastructure at LinkedIn and has a wealth of insights on scaling LLMs and optimizing GPU infrastructure.

We had the pleasure of chatting with Animesh Singh in our latest podcast episode! Animesh is the Director of GBU Infrastructure at LinkedIn and has a wealth of insights on scaling LLMs and optimizing GPU infrastructure.