
Elias Frantar
@elias_frantar
Researcher @OpenAI | prev. PhD @ISTAustria and intern @GoogleDeepmind | I also build super fast Lego Rubik's Cube robots.
ID: 3037495037
https://efrantar.github.io/ 14-02-2015 20:01:25
76 Tweet
489 Followers
128 Following

Exciting news from our latest LLM compression research! 🚀 Together with ISTAustria and @neuralmagic, we’ve been exploring sparse finetuning for LLMs and achieved 7.7 tokens/second on a single core and at 26.7 tokens/second on 4 cores of an AMD Ryzen CPU! (1/n)


Happy to release QUIK, a new accurate post-training quantization method which processes the majority of weights and activations using 4bit precision. [1/N] With Saleh Ashkboos Elias Frantar Torsten Hoefler 🇨🇠Paper: arxiv.org/abs/2310.09259 Code: github.com/IST-DASLab/QUIK Snapshot:
![Dan Alistarh (@dalistarh) on Twitter photo Happy to release QUIK, a new accurate post-training quantization method which processes the majority of weights and activations using 4bit precision.
[1/N]
With <a href="/AshkboosSaleh/">Saleh Ashkboos</a> <a href="/elias_frantar/">Elias Frantar</a> <a href="/thoefler/">Torsten Hoefler 🇨ðŸ‡</a>
Paper: arxiv.org/abs/2310.09259
Code: github.com/IST-DASLab/QUIK
Snapshot: Happy to release QUIK, a new accurate post-training quantization method which processes the majority of weights and activations using 4bit precision.
[1/N]
With <a href="/AshkboosSaleh/">Saleh Ashkboos</a> <a href="/elias_frantar/">Elias Frantar</a> <a href="/thoefler/">Torsten Hoefler 🇨ðŸ‡</a>
Paper: arxiv.org/abs/2310.09259
Code: github.com/IST-DASLab/QUIK
Snapshot:](https://pbs.twimg.com/media/F-QiTrIXoAACX14.jpg)

AutoGPTQ 0.7.0 is released and includes Elias Frantar's Marlin kernel for int4*fp16 matrix multiplication on Ampere GPUs. Check out github.com/AutoGPTQ/AutoG… - This is usable with any int4 quantized Transformers model (symmetric quantization, no act-order) directly from the Hub!🧵

Happy to release the write-up on the MARLIN kernel for fast LLM inference, now supporting 2:4 sparsity! Led by Elias Frantar & Roberto López Castro Paper: arxiv.org/abs/2408.11743 Code: github.com/IST-DASLab/Spa… MARLIN is integrated with vLLM thanks to @neuralmagic!