Alec Tschantz (@a_tschantz) 's Twitter Profile
Alec Tschantz

@a_tschantz

Applied AI lead @ verses.ai

ID: 961146888220233729

linkhttps://scholar.google.com/citations?user=5NbVgO0AAAAJ&hl=en calendar_today07-02-2018 07:57:30

1,1K Tweet

1,1K Takipçi

3,3K Takip Edilen

Jeff Johnston (@wjeffjohnston) 's Twitter Profile Photo

How does the brain represent multiple different things at once in a single population of neurons? Justin Fine, Neuro Polarbear, Becket Ebitz, Seng Bum Michael Yoo and I show that it uses semi-orthogonal subspaces for each item. Preprint here: arxiv.org/abs/2309.07766 Tweets below! (1/n)

Hadi Vafaii (@hadivafaii) 's Twitter Profile Photo

Imagine a model that unites predictive coding, sparse coding, and rate coding — all of 'em codings — under Bayesian inference. Wouldn't that be amazing? It’s already here: the Poisson Variational Autoencoder 👉🧵[1/n] w/ Dekel Galor & Jacob Yates 📜preprint: arxiv.org/abs/2405.14473

Imagine a model that unites predictive coding, sparse coding, and rate coding — all of 'em codings — under Bayesian inference. Wouldn't that be amazing?

It’s already here: the Poisson Variational Autoencoder 👉🧵[1/n]

w/ <a href="/dekelgalor/">Dekel Galor</a> &amp; <a href="/jcbyts/">Jacob Yates</a>
📜preprint: arxiv.org/abs/2405.14473
Kevin Patrick Murphy (@sirbayes) 's Twitter Profile Photo

I am delighted to share our recent paper: arxiv.org/abs/2405.19681. It can be thought of as a version of the Bayesian Learning Rule, extended to the fully online setting. This was a super fun project with Peter Chang and the unstoppable Matt Jones (matt.colorado.edu). By

AK (@_akhaliq) 's Twitter Profile Photo

Transformers are SSMs Generalized Models and Efficient Algorithms Through Structured State Space Duality While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown

Transformers are SSMs

Generalized Models and Efficient Algorithms Through Structured State Space Duality

While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown
elvis (@omarsar0) 's Twitter Profile Photo

The Geometry of Concepts in LLMs Studies the geometry of categorical concepts and how the hierarchical relations between them are encoded in LLMs. Finding from the paper: "Simple categorical concepts are represented as simplices, hierarchically related concepts are orthogonal

The Geometry of Concepts in LLMs

Studies the geometry of categorical concepts and how the hierarchical relations between them are encoded in LLMs. 

Finding from the paper: "Simple categorical concepts are represented as simplices, hierarchically related concepts are orthogonal
fly51fly (@fly51fly) 's Twitter Profile Photo

[LG] Learning to (Learn at Test Time): RNNs with Expressive Hidden States Y Sun, X Li, K Dalal, J Xu… [Stanford University & UC San Diego & UC Berkeley] (2024) arxiv.org/abs/2407.04620 - This paper proposes TTT (Test-Time Training) layers, a new class of sequence modeling layers

[LG] Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Y Sun, X Li, K Dalal, J Xu… [Stanford University &amp; UC San Diego &amp; UC Berkeley] (2024)
arxiv.org/abs/2407.04620

- This paper proposes TTT (Test-Time Training) layers, a new class of sequence modeling layers
Tommaso Salvatori (@tommsalvatori) 's Twitter Profile Photo

A new library to train predictive coding networks. A huge comparative study of multiple ways of training them. An advancement over current state-of-the-art. A large discussion on what did not work. All of this, and more, in our new preprint. arxiv.org/pdf/2407.01163

A new library to train predictive coding networks. A huge comparative study of multiple ways of training them. An advancement over current state-of-the-art. A large discussion on what did not work.

All of this, and more, in our new preprint. arxiv.org/pdf/2407.01163
Sukjun (June) Hwang (@sukjun_hwang) 's Twitter Profile Photo

So many interesting sequence mixers have been unveiled lately! Interested in designing new models on your own? We introduce useful primitives that make your model sub-quadratic, flexible, and powerful! As an example, we present Hydra, a principled bidirectional extension of Mamba

So many interesting sequence mixers have been unveiled lately! Interested in designing new models on your own?
We introduce useful primitives that make your model sub-quadratic, flexible, and powerful! As an example, we present Hydra, a principled bidirectional extension of Mamba
H.SHIMAZAKI (@h_shimazaki) 's Twitter Profile Photo

Here is my take on: Explosive neural networks via higher-order interactions in curved statistical manifolds Miguel Aguilera, Pablo A. Morales, Fernando E. Rosas, Hideaki Shimazaki arxiv.org/abs/2408.02326 on behalf of Miguel Aguilera Pablo Morales Fernando Rosas

Andreas Kirsch 🇺🇦 (@blackhc) 's Twitter Profile Photo

A small info-theory thread (or at least food for thought): Why is the Bayesian Model Average the best choice? Really why? I'll go through a naive argument (anyone has better references?), simple lower-bounds and decompositions, and pitch a "reverse mutual information" 1/15

A small info-theory thread (or at least food for thought):

Why is the Bayesian Model Average the best choice? Really why?

I'll go through a naive argument (anyone has better references?), simple lower-bounds and decompositions, and pitch a "reverse mutual information"

1/15
Ueli Rutishauser (@uelirutishauser) 's Twitter Profile Photo

Delighted our latest finding! We discovered that abstract representations emerge in the human hippocampus when learning to perform inference. This change in neural geometry is due to disentanglement of discovered latent and observable variables. nature nature.com/articles/s4158…

Richard Song (@xingyousong) 's Twitter Profile Photo

How does Google optimize its research and systems? We’ve revealed the secrets behind the Vizier Gaussian Process Bandit algorithm, the black-box optimizer that’s been run millions of times! Paper: arxiv.org/abs/2408.11527 Code: github.com/google/vizier Compared to other industry

How does Google optimize its research and systems? We’ve revealed the secrets behind the Vizier Gaussian Process Bandit algorithm, the black-box optimizer that’s been run millions of times! 

Paper: arxiv.org/abs/2408.11527
Code: github.com/google/vizier

Compared to other industry
Weijie Su (@weijie444) 's Twitter Profile Photo

New Research (w/ amazing Hangfeng He) "A Law of Next-Token Prediction in Large Language Models" LLMs rely on NTP, but their internal mechanisms seem chaotic. It's difficult to discern how each layer processes data for NTP. Surprisingly, we discover a physics-like law on NTP:

New Research (w/ amazing <a href="/hangfeng_he/">Hangfeng He</a>)

"A Law of Next-Token Prediction in Large Language Models"

LLMs rely on NTP, but their internal mechanisms seem chaotic. It's difficult to discern how each layer processes data for NTP. Surprisingly, we discover a physics-like law on NTP:
Michael Levin (@drmichaellevin) 's Twitter Profile Photo

New paper: Chris L Buckley, Tim Lewens at Cambridge HPS, Beren Millidge, Alec Tschantz, Richard Watson mdpi.com/1099-4300/26/9… "Natural Induction: Spontaneous Adaptive Organisation without Natural Selection" #evolution, #basalcognition, #physics Abstract: Evolution by

Mick Bonner (@michaelfbonner) 's Twitter Profile Photo

Can we gain a deep understanding of neural representations through dimensionality reduction? Our new work shows that the visual representations of the human brain need to be understood in high dimensions. w/ Raj & Brice Ménard. arxiv.org/abs/2409.06843#

Conor Heins (@conorheins) 's Twitter Profile Photo

1⃣ Excited to share new research from the Machine Learning Foundations Lab at VERSES AI Research! "Gradient-free variational learning with conditional mixture networks" 📰Paper: arxiv.org/abs/2408.16429 💻Code: github.com/VersesTech/cav…