Loreto Parisi(@loretoparisi) 's Twitter Profileg
Loreto Parisi

@loretoparisi

MSc EECS @UninaIT 2006. Working on #MachineLearning. #AI @musixmatchai Engineering Director @musixmatch Tweets are my own

ID:7981712

linkhttps://github.com/loretoparisi calendar_today06-08-2007 00:59:59

15,9K Tweets

1,5K Followers

1,2K Following

Salman Khan(@KhanSalmanH) 's Twitter Profile Photo

LLaMA3 and Phi3 have made the splash this week in LLM Arena. But how strong is their visual understanding ability?

⚡We release LLaMA3-Vision and Phi3-Vision models that beat their larger size LLM competitors.

Github: github.com/mbzuai-oryx/LL…
HF: huggingface.co/collections/MB…

LLaMA3 and Phi3 have made the splash this week in LLM Arena. But how strong is their visual understanding ability? ⚡We release LLaMA3-Vision and Phi3-Vision models that beat their larger size LLM competitors. Github: github.com/mbzuai-oryx/LL… HF: huggingface.co/collections/MB…
account_circle
Zengyi Qin(@qinzytech) 's Twitter Profile Photo

Introducing OpenVoice V2, our latest voice clone model

· Clone Any Voice, Speak in Many Languages
· Totally Free, Open-Sourced

Now your voice goes global in multiple languages🤯

Joint work by MyShell and MIT CSAIL

account_circle
Philipp Schmid(@_philschmid) 's Twitter Profile Photo

We can do it! 🙌 First open LLM outperforms OpenAI GPT-4 (March) on MT-Bench. WizardLM 2 is a fine-tuned and preferences-trained Mixtral 8x22B! 🤯

TL;DR;
🧮 Mixtral 8x22B based (141B-A40 MoE)
🔓 Apache 2.0 license
🤖 First > 9.00 on MT-Bench with an open LLM
🧬 Used multi-step…

We can do it! 🙌 First open LLM outperforms @OpenAI GPT-4 (March) on MT-Bench. WizardLM 2 is a fine-tuned and preferences-trained Mixtral 8x22B! 🤯 TL;DR; 🧮 Mixtral 8x22B based (141B-A40 MoE) 🔓 Apache 2.0 license 🤖 First > 9.00 on MT-Bench with an open LLM 🧬 Used multi-step…
account_circle
🇺🇦Ukrainian Front(@front_ukrainian) 's Twitter Profile Photo

⚡️🇺🇦Ukrainian pilots flew by helicopter to a boy from a front-line village to thank him.

The boy always met the airmen with the flag, so they decided to meet him and presented a package of sweets, toys and food for his family.

account_circle
AMD Radeon(@amdradeon) 's Twitter Profile Photo

We are working to release Micro-Engine Scheduler(MES) documentation towards end of May and will follow up with published source code for external review and feedback. We have also opened a GitHub tracker, which will have the latest status on fixes and release dates.

account_circle
Loreto Parisi(@loretoparisi) 's Twitter Profile Photo

JetMoE-8B has 24 blocks. Each block has two MoE layers: Mixture of Attention heads (MoA) and Mixture of MLP Experts (MoE). Each MoA and MoE layer has 8 expert, and 2 experts are activated for each input token. 💡 ModuleFormer: Modularity Emerges from MoeE
arxiv.org/abs/2306.04640

account_circle
Loreto Parisi(@loretoparisi) 's Twitter Profile Photo

1B Token Context Window is the largest C.W. to date. This implies as large as 750M english words (1x10^9 * 75*10^-2 = 75*10^7). ICL capabilities at this scale are still to be proven, but a Needle In A Haystack (NIAH) test of 100% accuracy for the new FastLLM is impressive.

1B Token Context Window is the largest C.W. to date. This implies as large as 750M english words (1x10^9 * 75*10^-2 = 75*10^7). ICL capabilities at this scale are still to be proven, but a Needle In A Haystack (NIAH) test of 100% accuracy for the new FastLLM is impressive.
account_circle
Silke Hahn ✨(@_SilkeHahn) 's Twitter Profile Photo

Aprilscherz oder echt? 🐣🤔🎧

Erinnert ihr euch an Voice Engine, OpenAIs Programm zum Stimmklonen? Eine Woche vorher hatte die Universität von Texas ein vergleichbares Werkzeug veröffentlicht — Voice Craft
the-decoder.de/open-source-st…

... das angeblich nur 3 Sekunden Stimme braucht…

account_circle
Loreto Parisi(@loretoparisi) 's Twitter Profile Photo

How do I return the response from an asynchronous call? (Good) old answer, still valid!
stackoverflow.com/a/36585554/758…

account_circle
Loreto Parisi(@loretoparisi) 's Twitter Profile Photo

Mixture of Exports (MoE)

‣ Mixtral 8x7B , 45B/12B active = Mixtral8x7B-45x12B
‣ Qwen1.5-MoE-A2.7B, 14.3B /72.7B active = Qwen1.5-14.3x2.7B
‣ Grok-1 314B/86B active = Grok-1-314x86B
‣ DBRX 132B/36B active = DBRX-132x36B

account_circle
Loreto Parisi(@loretoparisi) 's Twitter Profile Photo

So basically this (DBRX, 132B params with 36B active) and Grok-1 (314B params MoE, 86B active) confirm that MoE-based LLMs are the architectures for the 2024 horizon (I cannot bet on it for 2025).

account_circle