Tijmen Blankevoort (@tirune) Twitter Tweets • TwiCopy

Qualcomm Research & Technologies

3 years ago

Low-bit integers are the go to format for #AI model efficiency. When does floating point perform better? Our NeurIPS Conference accepted paper "FP8 Quantization: The Power of the Exponent" tackles this question. Tijmen Blankevoort Mart Jorn Peters Markus Nagel bit.ly/3V50VPL

thumb_up_off_alt7

chat_bubble_outline0

repeat5

shareShare

Qualcomm Research & Technologies

@qcomresearch

3 years ago

Wondering how it's possible to run very large #AI models such as Stable Diffusion and GPT on device? Our Qualcomm AI Research team has compared the most popular integer and floating point formats and weighs in on which one is most efficient: qualcomm.com/news/onq/2023/…

thumb_up_off_alt24

chat_bubble_outline0

repeat6

shareShare

Davis Blalock

@davisblalock

3 years ago

"FP8 versus INT8 for efficient deep learning inference" Is fp8 just plain better than int8? No. There are tradeoffs between the two at various levels of the stack, and this paper digs into their strengths and weaknesses. [1/11]

thumb_up_off_alt257

chat_bubble_outline3

repeat55

shareShare

Babak Ehteshami Bejnordi

@babakeht

2 years ago

We propose a dynamic tokenizer for ViTs, where the scale at which an image is processed varies based on the complexity of the image area. This means less computing for simple areas and more for complex, cluttered areas. Thanks to Amelie Royer, Jakob Havtorn, Tijmen Blankevoort

thumb_up_off_alt48

chat_bubble_outline0

repeat8

shareShare

Tijmen Blankevoort

@tirune

2 years ago

Talking in this Dutch podcast, about how a general AI, something that can do many tasks just like a human (and perhaps better) might not be as far away as you might think. 😃 Specifically RL+LLMs has the potential ability to supercharge the current model performance.

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Yuki

@y_m_asano

2 years ago

Very happy to announce that VeRA is accepted at ICLR 2026 with scores 8,8,8,5! VeRA makes LoRA ~10x more parameter efficient while retaining the same performance & also works for vision! Paper: arxiv.org/abs/2310.11454 Our very light-weight webpage😏: dkopi.github.io/vera/

Very happy to announce that VeRA is accepted at <a href="/iclr_conf/">ICLR 2026</a> with scores 8,8,8,5!
VeRA makes LoRA ~10x more parameter efficient while retaining the same performance & also works for vision!

Paper: arxiv.org/abs/2310.11454

Our very light-weight webpage😏: dkopi.github.io/vera/

thumb_up_off_alt643

chat_bubble_outline14

repeat119

shareShare

Tycho van der Ouderaa

@tychovdo

2 years ago

⭐️New paper ⭐️ Excited to share 'The LLM Surgeon', accepted at ICLR 2024. We obtain SOTA pruning performance and even demonstrate structured LLM pruning of full rows and cols. Direct practical impact enabling compression up to 20-30% with negligible loss in performance.🧵1/9👇

thumb_up_off_alt108

chat_bubble_outline3

repeat12

shareShare

Mart

@martvanbaalen

2 years ago

Our work on Vector Quantization for SOTA size vs accuracy trade-offs in LLMs is on Arxiv! Thanks co-authors Andrey Kuzmin, Markus Nagel, Peter Couperus, Cedric Bastoul, Eric Mahurin, Tijmen Blankevoort and Paul Whatmough for their hard work And thanks to AK for amplifying!

thumb_up_off_alt21

chat_bubble_outline0

repeat4

shareShare

Jeremy Howard

@jeremyphoward

2 years ago

Today, with Tim Dettmers, Hugging Face, & @mobius_labs, we're releasing FSDP/QLoRA, a new project that lets you efficiently train very large (70b) models on a home computer with consumer gaming GPUs. 1/🧵 answer.ai/posts/2024-03-…

thumb_up_off_alt3,3K

chat_bubble_outline84

repeat654

shareShare

Tycho van der Ouderaa

@tychovdo

2 years ago

Our paper, "The LLM Surgeon," accepted at ICLR 2024, achieves SOTA in LLM pruning in all unstructured, semi-structured, and the most challenging but most effective structured pruning that removes entire matrix rows/columns. Happy to share that code is now publicly available.

thumb_up_off_alt99

chat_bubble_outline1

repeat16

shareShare

Yuki

@y_m_asano

2 years ago

Another week, another release: Our PEFT method VeRA (LoRA but 10-100x less parameters thanks to random projections) is now on HF PEFT! so now's a good time to `pip install peft` Thx to Alex McKinney Benjamin Bossan Kopi Or Tea ? + Tijmen Blankevoort huggingface.co/docs/peft/pack…

thumb_up_off_alt66

chat_bubble_outline2

repeat18

shareShare

Yuki

@y_m_asano

2 years ago

Today we introduce Bidirectional Instruction Tuning (Bitune). It's a new way of adapting LLMs for the instruction->answering stage. It allows the model to process the instruction/question with bidirectional attention, while the answer generation remains causal.

thumb_up_off_alt150

chat_bubble_outline4

repeat20

shareShare

Nathan Benaich

@nathanbenaich

a year ago

🪩The State of AI 2024 has landed! 🪩 Our seventh installment is our biggest and most comprehensive yet, covering everything you *need* to know about research, industry, safety and politics. As ever, here's my director’s cut (+ video tutorial!) 🧵

thumb_up_off_alt1,1K

chat_bubble_outline31

repeat312

shareShare

Yuki

@y_m_asano

a year ago

So You Think your ICLR 2026 rejection was surprising? We nearly fell out of our chairs when our 7.25 avg rating (10,8,6,5 -- i.e. top 4%) Bitune paper got rejected 😅. It's not like new points or problems surfaced... Just ¯\_(ツ)_/¯ I guess? Sharing this so that especially

$So You Think your <a href="/iclr_conf/">ICLR 2026</a> rejection was surprising? We nearly fell out of our chairs when our 7.25 avg rating (10,8,6,5 -- i.e. top 4%) Bitune paper got rejected 😅. It's not like new points or problems surfaced... Just ¯\_(ツ)_/¯ I guess? Sharing this so that especially$

thumb_up_off_alt329

chat_bubble_outline11

repeat25

shareShare

Zechun Liu

@zechunliu

10 months ago

🚀 We're thrilled to announce that the SoTA low-bit quantization ParetoQ code is now open-source! 🌟 github.com/facebookresear… 🔍 What does this repo support? 🌟State-of-the-art sub-4-bit quantization: It is a significant upgrade from our previous LLM-QAT repo. Outperforming all

thumb_up_off_alt17

chat_bubble_outline0

repeat6

shareShare

Tijmen Blankevoort

@tirune

5 months ago

I recently made the news because of a doc I wrote in Meta’s GenAI organization. ‘The Information’ wrote about it as if I did a big raging ‘mic drop’ before leaving the company. Nothing could be further from the truth - so setting the record straight here. open.substack.com/pub/blankevoor…

thumb_up_off_alt75

chat_bubble_outline0

repeat3

shareShare