@kwindla : Llama 2 750B in 20GB! 4-bit quantized, 40% of layers removed, fine-tuning to "heal" after layer removal. Almost no difference on MMLU compared to base Llama 2 70B. This paper, "The Unreasonable Ineffectiveness of the Deeper Layers," was my airplane reading on the way to a • TwiCopy

kwindla

@kwindla

+ Follow

Infrastructure and developer tools for real-time voice, video, and AI. @trydaily // ᓚᘏᗢ // @pipecat_ai

ID: 16375739

linkhttps://machine-theory.com/ calendar_today20-09-2008 07:14:14

4,4K Tweet

8,8K Takipçi

3,3K Takip Edilen

kwindla

@kwindla

2 years ago

Llama 2 70B in 20GB! 4-bit quantized, 40% of layers removed, fine-tuning to "heal" after layer removal. Almost no difference on MMLU compared to base Llama 2 70B. This paper, "The Unreasonable Ineffectiveness of the Deeper Layers," was my airplane reading on the way to a

thumb_up_off_alt992

chat_bubble_outline30

repeat117

shareShare