Ali Behrouz (@behrouz_ali) Twitter Tweets • TwiCopy

Ali Behrouz

5 months ago

What makes attention the critical component for most advances in LLMs and what holds back long-term memory modules (RNNs)? Can we strictly generalize Transformers? Presenting Atlas (A powerful Titan): a new architecture with long-term in-context memory that learns how to

thumb_up_off_alt897

chat_bubble_outline23

repeat133

shareShare

TuringPost

@theturingpost

5 months ago

Last week, Google dropped a paper on ATLAS, a new architecture that reimagines how models learn and use memory. Unfortunately, it flew under everyone’s radar - but it shouldn’t have! So what's Atlas bringing to the table? ▪️ Active memory via Google’s so-called Omega rule. It

Last week, <a href="/Google/">Google</a> dropped a paper on ATLAS, a new architecture that reimagines how models learn and use memory.

Unfortunately, it flew under everyone’s radar - but it shouldn’t have! So what's Atlas bringing to the table?

▪️ Active memory via Google’s so-called Omega rule. It

thumb_up_off_alt418

chat_bubble_outline16

repeat78

shareShare

Pedro Domingos

@pmddomingos

5 months ago

The ratio of science to engineering in AI is approaching zero.

thumb_up_off_alt217

chat_bubble_outline22

repeat25

shareShare

Ali Behrouz

@behrouz_ali

5 months ago

Very interesting work!

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

leloy!

@leloykun

5 months ago

Fast, Numerically Stable, and Auto-Differentiable Spectral Clipping via Newton-Schulz Iteration Hi all, I'm bacc. I have a lot to talk about, but let's start with this fun side-project. Here I'll talk about novel (?) ways to compute: 1. Spectral Clipping (discussed in Rohan's

thumb_up_off_alt264

chat_bubble_outline7

repeat34

shareShare

Thomas G. Dietterich

@tdietterich

5 months ago

The scope of what counts as research has narrowed considerably.

thumb_up_off_alt284

chat_bubble_outline5

repeat10

shareShare

Yingheng Wang

@yingheng_wang

4 months ago

❓ Are LLMs actually problem solvers or just good at regurgitating facts? 🚨New Benchmark Alert! We built HeuriGym to benchmark if LLMs can craft real heuristics for real-world hard combinatorial optimization problems. 🛞 We’re open-sourcing it all: ✅ 9 problems ✅ Iterative

thumb_up_off_alt129

chat_bubble_outline2

repeat25

shareShare

Reza Bayat

@reza_byt

4 months ago

📄 New Paper Alert! ✨ 🚀Mixture of Recursions (MoR): Smaller models • Higher accuracy • Greater throughput Across 135 M–1.7 B params, MoR carves a new Pareto frontier: equal training FLOPs yet lower perplexity, higher few‑shot accuracy, and more than 2x throughput.

thumb_up_off_alt237

chat_bubble_outline2

repeat55

shareShare

Vahab Mirrokni

@mirrokni

4 months ago

Proud to announce an official Gold Medal at #IMO2025🥇 The IMO committee has certified the result from our general-purpose Gemini system—a landmark moment for our team and for the future of AI reasoning. deepmind.google/discover/blog/… (1/n) Highlights in thread:

thumb_up_off_alt327

chat_bubble_outline13

repeat34

shareShare

Ali Behrouz

@behrouz_ali

3 months ago

Everyone is talking about reviewers who don't engage or provide low-quality reviews. While harmful, I don't see that as the biggest threat to the peer review system. As both an author and reviewer, I'm seeing zero-sum debates where a reviewer puts their full effort into rejecting

thumb_up_off_alt12

chat_bubble_outline1

repeat0

shareShare

Gabriel Mongaras

@gmongaras

3 months ago

Threw a paper I've been working on onto ArXiv. Trying to get a little closer to understanding why softmax in attention works so well compared to other activation functions. arxiv.org/abs/2507.23632

thumb_up_off_alt17

chat_bubble_outline0

repeat4

shareShare

DeepLearning.AI

@deeplearningai

2 months ago

Google researchers introduced ATLAS, a transformer-like language model architecture. ATLAS replaces attention with a trainable memory module and processes inputs up to 10 million tokens. The team trained a 1.3 billion-parameter model on FineWeb, updating only the memory module

thumb_up_off_alt1,1K

chat_bubble_outline31

repeat226

shareShare

Julien Siems

@julien_siems

2 months ago

Accepted at NeurIPS 2025, come see us in San Diego to discuss linear RNNs!

thumb_up_off_alt64

chat_bubble_outline1

repeat14

shareShare

Hamed Mahdavi

@hamedmahdavi93

a month ago

This paper from Penn State researchers literally blew my mind🤯🤯 Just kidding, I am excited to share our work on model merging! We leveraged the connection between Adam optimizer's second moments and curvature information /

thumb_up_off_alt37

chat_bubble_outline2

repeat10

shareShare

Yunhao Fang

@fangyunhao_x

a month ago

Transformers hit the memory wall, RNNs hit the forgetting wall. TLDR: We introduce Artificial Hippocampus Networks (AHNs), a lightweight add-on (<0.5% extra parameters) that compresses infinite context into a fixed-size memory for efficient long-context modeling.

thumb_up_off_alt28

chat_bubble_outline1

repeat7

shareShare

himanshu dubey

@himanshustwts

a month ago

Author of Titans and Atlas from Deepmind is the upcoming guest on Ground Zero pod! it was a quite a chat w him on the recent progress with titans, atlas and their real adoption.

Author of Titans and Atlas from Deepmind is the upcoming guest on <a href="/groundzero_twt/">Ground Zero</a> pod!

it was a quite a chat w him on the recent progress with titans, atlas and their real adoption.

thumb_up_off_alt105

chat_bubble_outline2

repeat4

shareShare

Tilde

@tilderesearch

25 days ago

Modern optimizers can struggle with unstable training. Building off of Manifold Muon, we explore more lenient mechanisms for constraining the geometry of a neural network's weights directly through their Gram matrix 🧠 A 🧵… ~1/6~

thumb_up_off_alt179

chat_bubble_outline2

repeat20

shareShare