Nicolas Zucchet (@nicolaszucchet) 's Twitter Profile
Nicolas Zucchet

@nicolaszucchet

PhD student in NeuroAI @CSatETH | prev. @Polytechnique

ID: 936865625233817600

linkhttps://nicolaszucchet.github.io calendar_today02-12-2017 07:52:25

100 Tweet

317 Followers

274 Following

AK (@_akhaliq) 's Twitter Profile Photo

Google presents Griffin Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN

Google presents Griffin

Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN
Antonio Orvieto (@orvieto_antonio) 's Twitter Profile Photo

S4, Mamba, and Hawk/Griffin are great – but do we really understand how they work? We fully characterize the power of gated (selective) SSMs mathematically using powerful tools from Rough Path Theory. All thanks to our math magician Nicola Muça Cirone arxiv.org/pdf/2402.19047… 🧵

S4, Mamba, and Hawk/Griffin are great – but do we really understand how they work? We fully characterize the power of gated (selective) SSMs mathematically using powerful tools from Rough Path Theory. All thanks to our math magician <a href="/MucaCirone/">Nicola Muça Cirone</a> 

arxiv.org/pdf/2402.19047…
🧵
Mackenzie Mathis, PhD (@trackingactions) 's Twitter Profile Photo

Interested in Computational Neuroscience in 🇨🇭? Looking for a PhD or masters thesis project? 👀 Check out the impressive network of labs! #neuroAI #compneuro #AcademicTwitter #neuroTwitter #neuroX #machinelearning swisscompneuro.org

Charlotte Frenkel (@c_frenkel) 's Twitter Profile Photo

📢 Wondering how the neocortex works, how it is related to modern machine learning algorithms, and how this insight can be used to fuel next-gen neuromorphic hardware? Have a look at this PhD opening in my team: tudelft.nl/over-tu-delft/… Position open until filled, apply early!

📢 Wondering how the neocortex works, how it is related to modern machine learning algorithms, and how this insight can be used to fuel next-gen neuromorphic hardware?
Have a look at this PhD opening in my team: tudelft.nl/over-tu-delft/…
Position open until filled, apply early!
Xiaolong Wang (@xiaolonw) 's Twitter Profile Photo

Cannot believe this finally happened! Over the last 1.5 years, we have been developing a new LLM architecture, with linear complexity and expressive hidden states, for long-context modeling. The following plots show our model trained from Books scale better (from 125M to 1.3B)

Cannot believe this finally happened! Over the last 1.5 years, we have been developing a new LLM architecture, with linear complexity and expressive hidden states, for long-context modeling. The following plots show our model trained from Books scale better (from 125M to 1.3B)
Michael E. Sander (@m_e_sander) 's Twitter Profile Photo

🚨🚨New ICML 2024 Paper: arxiv.org/abs/2402.05787 How do Transformers perform In-Context Autoregressive Learning? We investigate how causal Transformers learn simple autoregressive processes or order 1. with Raja Giryes 💔, Taiji Suzuki, Mathieu Blondel and Gabriel Peyré 🙏

🚨🚨New ICML 2024 Paper: arxiv.org/abs/2402.05787

How do Transformers perform In-Context Autoregressive Learning?

We  investigate how causal Transformers learn simple autoregressive processes or order 1. 

with <a href="/RGiryes/">Raja Giryes 💔</a>, <a href="/btreetaiji/">Taiji Suzuki</a>, <a href="/mblondel_ml/">Mathieu Blondel</a> and <a href="/gabrielpeyre/">Gabriel Peyré</a> 🙏
Nicolas Zucchet (@nicolaszucchet) 's Twitter Profile Photo

I couldn't recommend this tutorial more highly. It brings extremely sharp insights into the strengths of weaknesses of modern language models, with a level of scientific rigor that I have rarely seen elsewhere!