Haihao Shen (@haihaoshen) 's Twitter Profile
Haihao Shen

@haihaoshen

Creator of Intel Neural Compressor/Speed/Coder, Intel Ext. for Transformers, AutoRound; HF Optimum-Intel Maintainer; Founding member of OPEA; Opinions my own

ID: 1438706609400651777

linkhttps://github.com/intel/intel-extension-for-transformers calendar_today17-09-2021 03:29:57

489 Tweet

3,3K Followers

2,2K Following

Haihao Shen (@haihaoshen) 's Twitter Profile Photo

We received the first batch of response of LLM low-bit quantization. Congrats AutoGPTQ as winner! ❓Here comes Questions 2 with your help needed: if AutoRound outperforms than the others, are you going to try it now? See reference: arxiv.org/pdf/2309.05516

Haihao Shen (@haihaoshen) 's Twitter Profile Photo

🔥AutoRound now supports INT4 quantization for multi-modal models! Start from Llava Haotian Liu with higher accuracy than the other popular approaches (e.g., llama.cpp). 🎯Github: github.com/intel/auto-rou… (star the project if you like).

🔥AutoRound now supports INT4 quantization for multi-modal models! Start from Llava <a href="/imhaotian/">Haotian Liu</a> with higher accuracy than the other popular approaches (e.g., llama.cpp). 
🎯Github: github.com/intel/auto-rou… (star the project if you like).
Haihao Shen (@haihaoshen) 's Twitter Profile Photo

Small model, big power! Happy to share that Intel NeuralChat-7B made a solid step to improve LLM response factual consistency and reduce hallucination rates! Thanks to Huma Abidi for the great leadership and support on responsible AI and 吕考考 and team! #IAmIntel

Haihao Shen (@haihaoshen) 's Twitter Profile Photo

Competition of Low-bit quantized LLMs is really high like Olympic games! See the leaderboard: huggingface.co/spaces/Intel/l…, which shows Qwen2-7B-INC > Qwen2-7B-AWQ > Llama3.1-8B-INC > Qwen2-7B-GPTQ > ..., where algorithm matters a lot. Visit the leaderboard before choosing your model.

Haihao Shen (@haihaoshen) 's Twitter Profile Photo

🎇PyTorch now supports Intel GPUs to accelerate AI workloads! Congrats team!! Check out the blog: intel.com/content/www/us… cc Raja Koduri

Haihao Shen (@haihaoshen) 's Twitter Profile Photo

🎯Continued the thread of LLM low-bit quantization around Llama3.1, INC (Intel Neural Compressor), AWQ, and BnB provides the day-0 support, while the INT4 accuracy has *big* difference. See below diagram, and more details in leaderboard: huggingface.co/spaces/Intel/l…

🎯Continued the thread of LLM low-bit quantization around Llama3.1, INC (Intel Neural Compressor), AWQ, and BnB provides the day-0 support, while the INT4 accuracy has *big* difference. See below diagram, and more details in leaderboard: huggingface.co/spaces/Intel/l…
Haihao Shen (@haihaoshen) 's Twitter Profile Photo

🔥Excited to share that🥇of Llama3.1 INT4 model uses Intel Neural Compressor (INC). Congrats to #Intel INC team! Also thanks to all the submissions! 🎯Low-bit LLM Leaderboard: huggingface.co/spaces/Intel/l…

🔥Excited to share that🥇of Llama3.1 INT4 model uses Intel Neural Compressor (INC). Congrats to #Intel INC team! Also thanks to all the submissions!
🎯Low-bit LLM Leaderboard: huggingface.co/spaces/Intel/l…
Haihao Shen (@haihaoshen) 's Twitter Profile Photo

🥳Happy to share with you that ONNX Neural Compressor (ONC) v1.0 is officially released and is now available on ONNX community: github.com/onnx/neural-co…. ONC inherits from Intel Neural Compressor (INC) with a clear focus on compression support for ONNX models. Congrats, INC team!

Haihao Shen (@haihaoshen) 's Twitter Profile Photo

🥳Super excited to share Gaudi SW v1.17 is officially released. One of the highlighted features is FP8 and INT4 inference using Intel Neural Compressor. 🎯Check out the 1.17.0 documentation (habana.ai) to get started! Gaudi is the only alternative to NVidia GPU now!

Haihao Shen (@haihaoshen) 's Twitter Profile Photo

🎯Happy to share with you an awesome video from Fahd Mirza on LLM INT4 quantization using AutoRound (part of INC)! AutoRound is your go-to-LLM-quantization tool, in particular for INT4 quantization with the highest model accuracy! 🔥Check out the video: youtube.com/watch?v=khekPv…

Haihao Shen (@haihaoshen) 's Twitter Profile Photo

I am honored to be part of OPEA and have the opportunity in leading OPEA architecture. OPEA is your great choice when building Enterprise AI applications!

Haihao Shen (@haihaoshen) 's Twitter Profile Photo

🔥Super excited to share with you a nice blog from Benjamin Marie: Intel AutoRound: Accurate Low-bit Quantization for LLMs (link: kaitchup.substack.com/p/intel-autoro…). Thanks to Benjamin! 🎯AutoRound: github.com/intel/auto-rou…

Haihao Shen (@haihaoshen) 's Twitter Profile Photo

🎯Super interesting to see the 4-bit quantization tool ranking like Olympics Game: 🥇AutoRound 🥈Bitsandbytes 🥉HQQ, GPTQ, AQLM 🔥Additional info from low-bit LLM leaderboard: huggingface.co/spaces/Intel/l…

🎯Super interesting to see the 4-bit quantization tool ranking like Olympics Game:
🥇AutoRound
🥈Bitsandbytes
🥉HQQ, GPTQ, AQLM

🔥Additional info from low-bit LLM leaderboard: huggingface.co/spaces/Intel/l…
Haihao Shen (@haihaoshen) 's Twitter Profile Photo

🔥INC + Gaudi: accelerating LLM performance on Intel Gaudi with FP8 and INT4 low precision powered by INC 🎯Check out the blog: medium.com/intel-analytic…

Haihao Shen (@haihaoshen) 's Twitter Profile Photo

Intel Neural Compressor + AutoRound provides the powerful quantization support and empowers the efficient MLPerf inference on Xeon!! INC: github.com/intel/neural-c… AutoRound: github.com/intel/auto-rou…

Haihao Shen (@haihaoshen) 's Twitter Profile Photo

Thanks Rohan Paul for trying AutoRound! I am proud of the team who created AutoRound and contributed such a great quantization tool to the LLM community!!