Haihao Shen (@haihaoshen) Twitter Tweets • TwiCopy

4 months ago

🔥AutoRound Github: github.com/intel/auto-rou…. Give a try and star the project, if it's useful for you.🎯

We received the first batch of response of LLM low-bit quantization. Congrats AutoGPTQ as winner! ❓Here comes Questions 2 with your help needed: if AutoRound outperforms than the others, are you going to try it now? See reference: arxiv.org/pdf/2309.05516

thumb_up_off_alt6

repeat1

Haihao Shen

3 months ago

Met with Raja Koduri in Shanghai - very nice!

Met with <a href="/RajaXg/">Raja Koduri</a> in Shanghai - very nice!

thumb_up_off_alt32

chat_bubble_outline2

repeat0

Haihao Shen

3 months ago

🔥AutoRound now supports INT4 quantization for multi-modal models! Start from Llava Haotian Liu with higher accuracy than the other popular approaches (e.g., llama.cpp). 🎯Github: github.com/intel/auto-rou… (star the project if you like).

🔥AutoRound now supports INT4 quantization for multi-modal models! Start from Llava <a href="/imhaotian/">Haotian Liu</a> with higher accuracy than the other popular approaches (e.g., llama.cpp).
🎯Github: github.com/intel/auto-rou… (star the project if you like).

thumb_up_off_alt18

repeat4

Haihao Shen

3 months ago

Small model, big power! Happy to share that Intel NeuralChat-7B made a solid step to improve LLM response factual consistency and reduce hallucination rates! Thanks to Huma Abidi for the great leadership and support on responsible AI and 吕考考 and team! #IAmIntel

Haihao Shen

3 months ago

Competition of Low-bit quantized LLMs is really high like Olympic games! See the leaderboard: huggingface.co/spaces/Intel/l…, which shows Qwen2-7B-INC > Qwen2-7B-AWQ > Llama3.1-8B-INC > Qwen2-7B-GPTQ > ..., where algorithm matters a lot. Visit the leaderboard before choosing your model.

Haihao Shen

3 months ago

🎇PyTorch now supports Intel GPUs to accelerate AI workloads! Congrats team!! Check out the blog: intel.com/content/www/us… cc Raja Koduri

Haihao Shen

3 months ago

🎯Continued the thread of LLM low-bit quantization around Llama3.1, INC (Intel Neural Compressor), AWQ, and BnB provides the day-0 support, while the INT4 accuracy has *big* difference. See below diagram, and more details in leaderboard: huggingface.co/spaces/Intel/l…

thumb_up_off_alt35

chat_bubble_outline2

repeat6

Haihao Shen

3 months ago

🔥Excited to share that🥇of Llama3.1 INT4 model uses Intel Neural Compressor (INC). Congrats to #Intel INC team! Also thanks to all the submissions! 🎯Low-bit LLM Leaderboard: huggingface.co/spaces/Intel/l…

Haihao Shen

3 months ago

🥳Happy to share with you that ONNX Neural Compressor (ONC) v1.0 is officially released and is now available on ONNX community: github.com/onnx/neural-co…. ONC inherits from Intel Neural Compressor (INC) with a clear focus on compression support for ONNX models. Congrats, INC team!

Haihao Shen

2 months ago

🥳Super excited to share Gaudi SW v1.17 is officially released. One of the highlighted features is FP8 and INT4 inference using Intel Neural Compressor. 🎯Check out the 1.17.0 documentation (habana.ai) to get started! Gaudi is the only alternative to NVidia GPU now!

Haihao Shen

2 months ago

🎯Happy to share with you an awesome video from Fahd Mirza on LLM INT4 quantization using AutoRound (part of INC)! AutoRound is your go-to-LLM-quantization tool, in particular for INT4 quantization with the highest model accuracy! 🔥Check out the video: youtube.com/watch?v=khekPv…

Haihao Shen

2 months ago

I am honored to be part of OPEA and have the opportunity in leading OPEA architecture. OPEA is your great choice when building Enterprise AI applications!

thumb_up_off_alt30

chat_bubble_outline3

repeat2

Haihao Shen

2 months ago

🔥Super excited to share with you a nice blog from Benjamin Marie: Intel AutoRound: Accurate Low-bit Quantization for LLMs (link: kaitchup.substack.com/p/intel-autoro…). Thanks to Benjamin! 🎯AutoRound: github.com/intel/auto-rou…

thumb_up_off_alt5

chat_bubble_outline1

repeat1

Haihao Shen

2 months ago

🎯Super interesting to see the 4-bit quantization tool ranking like Olympics Game: 🥇AutoRound 🥈Bitsandbytes 🥉HQQ, GPTQ, AQLM 🔥Additional info from low-bit LLM leaderboard: huggingface.co/spaces/Intel/l…

thumb_up_off_alt5

repeat0

Haihao Shen

2 months ago

🔥INC + Gaudi: accelerating LLM performance on Intel Gaudi with FP8 and INT4 low precision powered by INC 🎯Check out the blog: medium.com/intel-analytic…

thumb_up_off_alt48

chat_bubble_outline2

repeat8

Haihao Shen

2 months ago

Congrats to Intel AI ! Intel AutoRound shows significantly better performance!!

thumb_up_off_alt37

repeat4

Haihao Shen

2 months ago

Intel Neural Compressor + AutoRound provides the powerful quantization support and empowers the efficient MLPerf inference on Xeon!! INC: github.com/intel/neural-c… AutoRound: github.com/intel/auto-rou…

thumb_up_off_alt26

repeat2

Haihao Shen

2 months ago

Thanks Rohan Paul for trying AutoRound! I am proud of the team who created AutoRound and contributed such a great quantization tool to the LLM community!!

thumb_up_off_alt36

chat_bubble_outline1

repeat3