Nicolas Zucchet (@nicolaszucchet) Twitter Tweets • TwiCopy

Nicolas Zucchet

@nicolaszucchet

+ Follow

PhD student in NeuroAI @CSatETH | prev. @Polytechnique

ID: 936865625233817600

linkhttps://nicolaszucchet.github.io calendar_today02-12-2017 07:52:25

100 Tweet

317 Followers

274 Following

AK

@_akhaliq

7 months ago

Google presents Griffin Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN

Antonio Orvieto

@orvieto_antonio

6 months ago

S4, Mamba, and Hawk/Griffin are great – but do we really understand how they work? We fully characterize the power of gated (selective) SSMs mathematically using powerful tools from Rough Path Theory. All thanks to our math magician Nicola Muça Cirone arxiv.org/pdf/2402.19047… 🧵

Mackenzie Mathis, PhD

@trackingactions

6 months ago

Interested in Computational Neuroscience in 🇨🇭? Looking for a PhD or masters thesis project? 👀 Check out the impressive network of labs! #neuroAI #compneuro #AcademicTwitter #neuroTwitter #neuroX #machinelearning swisscompneuro.org

Charlotte Frenkel

@c_frenkel

5 months ago

📢 Wondering how the neocortex works, how it is related to modern machine learning algorithms, and how this insight can be used to fuel next-gen neuromorphic hardware? Have a look at this PhD opening in my team: tudelft.nl/over-tu-delft/… Position open until filled, apply early!

Xiaolong Wang

@xiaolonw

2 months ago

Cannot believe this finally happened! Over the last 1.5 years, we have been developing a new LLM architecture, with linear complexity and expressive hidden states, for long-context modeling. The following plots show our model trained from Books scale better (from 125M to 1.3B)

thumb_up_off_alt1,1K

chat_bubble_outline22

repeat268

shareShare

Michael E. Sander

@m_e_sander

2 months ago

🚨🚨New ICML 2024 Paper: arxiv.org/abs/2402.05787 How do Transformers perform In-Context Autoregressive Learning? We investigate how causal Transformers learn simple autoregressive processes or order 1. with Raja Giryes 💔, Taiji Suzuki, Mathieu Blondel and Gabriel Peyré 🙏

Caglar Gulcehre

@caglarml

2 months ago

Joao Sacramento is presenting his work on mesa-optimization at NGSM workshop #ICML2024.

thumb_up_off_alt23

chat_bubble_outline0

repeat4

shareShare

Nicolas Zucchet

@nicolaszucchet

24 days ago

I couldn't recommend this tutorial more highly. It brings extremely sharp insights into the strengths of weaknesses of modern language models, with a level of scientific rigor that I have rarely seen elsewhere!

thumb_up_off_alt11

chat_bubble_outline1

repeat0

shareShare