Zechun Liu (@zechunliu) Twitter Tweets • TwiCopy

Zechun Liu

2 years ago

Excited to announce the #iccv2023 workshop on Low-Bit Quantized Neural Networks (LBQNN)! Call for papers: sites.google.com/view/lbqnn-icc… (by August 1, 2023). Topics can vary from computer vision to language models. Any work related to low-bit quantization is welcome.

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Yunyang Xiong

@youngxiong1

2 years ago

Our vision-language LLM, MiniGPT-v2, achieves state-of-the-art performances on a broad range of vision-language tasks compared with recent generalist models. Try our demo at minigpt-v2.github.io.

thumb_up_off_alt196

chat_bubble_outline2

repeat46

shareShare

Yann LeCun

@ylecun

2 years ago

Running LLMs on mobile devices? Possible with a few tricks from Meta.

thumb_up_off_alt534

chat_bubble_outline21

repeat65

shareShare

Zechun Liu

@zechunliu

a year ago

🤩MobileLLM source code is available on github.com/facebookresear… ! 🎊Besides the MobileLLM-125M/350M models reported in the original paper, we also included results for MobileLLM-600M/1B/1.5B. Please kindly check our repo. 🌟Paper: arxiv.org/abs/2402.14905

thumb_up_off_alt35

chat_bubble_outline0

repeat8

shareShare

Zechun Liu

@zechunliu

a year ago

🌟SpinQuant code is now available at github.com/facebookresear…… Welcome to use! 🎉🎉🎉

thumb_up_off_alt9

chat_bubble_outline1

repeat0

shareShare

Zechun Liu

@zechunliu

a year ago

🎯As an extension of SpinQuant, we propose RoLoRA, which integrates rotation into QAT (LoRA+Quantization). 🎊It achieves 29.5 points improvement of 4-bit weight-activation quantized LLaMA2-13B on commonsense reasoning tasks compared to baseline. 🌟paper: arxiv.org/pdf/2407.08044

thumb_up_off_alt13

chat_bubble_outline0

repeat2

shareShare

Zechun Liu

@zechunliu

a year ago

🎉Check out our new paper on agent system evaluation! 🌟Get step-by-step feedback by using Agent-as-a-Judge.

thumb_up_off_alt7

chat_bubble_outline2

repeat1

shareShare

Zechun Liu

@zechunliu

a year ago

🎉I'm excited to share the news that SpinQuant supported the live demo in Meta Connect! We just made our 4-bit quantized LLaMA SpinQuant model publicly available. Check it out if you're interested: ai.meta.com/blog/meta-llam…

thumb_up_off_alt17

chat_bubble_outline1

repeat1

shareShare

Yunyang Xiong

@youngxiong1

a year ago

🚨VideoLLM from Meta!🚨 LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding 📝Paper: huggingface.co/papers/2410.17… 🧑🏻‍💻Code: github.com/Vision-CAIR/Lo… 🚀Project (Demo): vision-cair.github.io/LongVU We propose LongVU, a video LLM with a spatiotemporal adaptive

thumb_up_off_alt253

chat_bubble_outline5

repeat73

shareShare

Zechun Liu

@zechunliu

a year ago

🚀We're thrilled to announce the MobileLLM weights are Available on HuggingFace: huggingface.co/collections/fa… 📱MobileLLM is a state-of-the-art language model designed for mobile devices： arxiv.org/abs/2402.14905 🔥Explore the pretraining code on GitHub: github.com/facebookresear…

thumb_up_off_alt25

chat_bubble_outline1

repeat8

shareShare

Zechun Liu

@zechunliu

a year ago

Thanks Yann LeCun for promoting our work. 🎉 MobileLLM models at sizes 125M 350M 600M are now available on HuggingFace! 🚀 huggingface.co/collections/fa…

thumb_up_off_alt10

chat_bubble_outline0

repeat1

shareShare

Yunyang Xiong

@youngxiong1

a year ago

🚀Excited to share our Efficient Track Anything. It is small but mighty, >2x faster than SAM2 on A100 and runs > 10 FPS on iPhone 15 Pro Max. How’d we do it? EfficientSAM + Efficient Memory Attention! Paper: arxiv.org/pdf/2411.18933 Project (demo): yformer.github.io/efficient-trac… with:

thumb_up_off_alt111

chat_bubble_outline3

repeat37

shareShare

Forrest Iandola

@fiandola

a year ago

[1/n] 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗧𝗿𝗮𝗰𝗸 𝗔𝗻𝘆𝘁𝗵𝗶𝗻𝗴 from Meta: interactive video segmentation and tracking on an iPhone!

thumb_up_off_alt527

chat_bubble_outline13

repeat111

shareShare

Yuandong Tian

@tydsh

10 months ago

We introduce ParetoQ, a series of pre-trained models that show SoTA in trinary (1.58bit), 2/3/4-bit quantization for SLMs (up to 3B parameters) using initial full pre-training + QAT later. In addition, we also discover that the representation changes substantially after low-bit

thumb_up_off_alt74

chat_bubble_outline2

repeat13

shareShare

Zechun Liu

@zechunliu

10 months ago

Our ParetoQ is substantially better than the previous work in ternary LLM, such as 1-bit era paper.

thumb_up_off_alt24

chat_bubble_outline0

repeat6

shareShare

Beidi Chen

@beidichen

9 months ago

⏰📢After years of working on long-context efficiency, I’ve started to doubt if it’s truly necessary (Many of you have probably noticed the decline of interest in long llms). Despite strong models like Gemini, short-context + retrieval often do the trick—faster, cheaper, and

thumb_up_off_alt456

chat_bubble_outline20

repeat95

shareShare

Zechun Liu

@zechunliu

6 months ago

🚀 We’re releasing ParetoQ, a family of quantized MobileLLMs — ultra-efficient, performance-retaining models for edge devices. 🧠 Smallest model: 1-bit, 125M → only 16MB on disk 📈 1.58-bit 600M even beats 1.58-bit 3B from BitNet(1-bit Era paper) 🔥 👉 Models:

thumb_up_off_alt14

chat_bubble_outline0

repeat1

shareShare

PyTorch

@pytorch

5 months ago

Quantization of large language models aims to cut compute and memory needs while keeping performance. 𝐏𝐚𝐫𝐞𝐭𝐨𝐐 delivers SOTA results across bit-widths, showing 1.58-, 2-, and 3-bit quantization offer better size-accuracy trade-offs than 4-bit. 💡 Read more:

thumb_up_off_alt128

chat_bubble_outline0

repeat13

shareShare

Forrest Iandola

@fiandola

5 months ago

Efficient Track Anything is accepted to ICCV 2025! See you in Hawaii!

thumb_up_off_alt16

chat_bubble_outline0

repeat5

shareShare