NicoNico🦇🔊
@niangao_g
🎓 PhD student @ Hasso Plattner Institute | 🤖 Crafting smarter, not harder AI | Exploring the edges of efficiency
ID: 802190125757124608
25-11-2016 16:40:04
592 Tweet
168 Takipçi
2,2K Takip Edilen
Happy to share DeepMixtral-8x7b-Instruct. A direct extraction/transfer of Mixtral Instruct's experts into Deepseek's architecture. Performance is identical, if not even a bit better, and seems more malleable to training. Collaborators Eric Hartford Fernando Fernandes Neto.