Nick Alonso (@nick__alonso) Twitter Tweets • TwiCopy

Zyphra

a year ago

Zyphra is releasing our first reasoning model, ZR1-1.5B. This small but powerful reasoning model excels at both math and code, making it one of the best models in these categories for its size. It also uses 60% less reasoning tokens than comparable models. 🆓Apache 2.0 license.

thumb_up_off_alt501

chat_bubble_outline15

repeat65

shareShare

Nick Alonso

@nick__alonso

10 months ago

Learning in real time, during deployment, i.e. doing online-continual learning, effectively is important for many applications. It's also associated with theories of intelligence that emphasize learning efficiency, and is an ability where the gap between animals and AI is large.

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

rishi

@overquantized

10 months ago

reach out if you want to work with me and others on novel architectures for pretraining! dms are open jobs.ashbyhq.com/zyphra/e509d43…

thumb_up_off_alt15

chat_bubble_outline0

repeat4

shareShare

TensorWave

@tensorwavecloud

8 months ago

It’s not just about GPUs. It’s about the ecosystem. Quentin Anthony joined Jeff Tatarchuk on the Beyond CUDA podcast to share how moving to AMD MI300X cut training costs at Zyphra 📺 Watch the full episode on YouTube (link in comments)

thumb_up_off_alt13

chat_bubble_outline2

repeat4

shareShare

Zyphra

@zyphraai

7 months ago

Read more at the blog post here: newsroom.ibm.com/2025-10-01-ibm…

thumb_up_off_alt12

chat_bubble_outline0

repeat2

shareShare

rishi

@overquantized

7 months ago

new paper arxiv.org/abs/2510.04476

thumb_up_off_alt282

chat_bubble_outline10

repeat33

shareShare

Zyphra

@zyphraai

7 months ago

Zyphra is excited to release Compressed Convolutional Attention (CCA), a novel attention mechanism that: - Beats MHA, GQA, MLA for dense and MoE models - Reduces training/prefill flops - 3x fewer parameters vs MHA - Matches GQA/MLA KV-cache sizes without quality penalty

<a href="/ZyphraAI/">Zyphra</a> is excited to release Compressed Convolutional Attention (CCA), a novel attention mechanism that:
- Beats MHA, GQA, MLA for dense and MoE models
- Reduces training/prefill flops
- 3x fewer parameters vs MHA
- Matches GQA/MLA KV-cache sizes without quality penalty

thumb_up_off_alt30

chat_bubble_outline1

repeat6

shareShare

Quentin Anthony

@quentinanthon15

6 months ago

At this point in attention-free architectures, so many people have poisoned the well that it's just a well of poison. A "Transformer Killer™" drops once a month, and then the authors come back and "kill" transformers again like 5 months later. Love the work, I'm knee-deep in a

thumb_up_off_alt36

chat_bubble_outline1

repeat1

shareShare

Songlin Yang

@songlinyang4

6 months ago

Hi Jeff Dean, what’s the plan for releasing the code for this line of work? None of these papers so far seem to have released any code

Hi <a href="/JeffDean/">Jeff Dean</a>, what’s the plan for releasing the code for this line of work? None of these papers so far seem to have released any code

thumb_up_off_alt1,1K

chat_bubble_outline22

repeat39

shareShare

Zyphra

@zyphraai

3 months ago

Today Zyphra releases OVQ-attention, an advancement for efficient long-context processing! Existing LLM layers compress input too much, leading to poor long-context understanding, or too little, leading to expensive memory+compute. OVQ-attention is an alternative path. 🧵

Today <a href="/ZyphraAI/">Zyphra</a> releases OVQ-attention, an advancement for efficient long-context processing!

Existing LLM layers compress input too much, leading to poor long-context understanding, or too little, leading to expensive memory+compute.

OVQ-attention is an alternative path. 🧵

thumb_up_off_alt222

chat_bubble_outline5

repeat37

shareShare

Nick Alonso

@nick__alonso

3 months ago

Nice summary.👇

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

samsja

@samsja19

3 months ago

Zyphra is still under the radar but doing truly innovative architecture work

thumb_up_off_alt122

chat_bubble_outline0

repeat7

shareShare

Zyphra

@zyphraai

2 months ago

Zyphra releases research on a new way to build hybrid models. We introduce a new architecture leveraging the complementary strengths of Transformers and RNNs for greater flexibility and performance than existing approaches. We call it Hybrid Associative Memory (HAM). 🧵

<a href="/ZyphraAI/">Zyphra</a> releases research on a new way to build hybrid models. We introduce a new architecture leveraging the complementary strengths of Transformers and RNNs for greater flexibility and performance than existing approaches.

We call it Hybrid Associative Memory (HAM). 🧵

thumb_up_off_alt39

chat_bubble_outline4

repeat16

shareShare

Vasu Shyam

@vasud3vshyam

a month ago

Leon and Kamesh Krishnamurthy really cooked with this one! arxiv.org/abs/2603.22325

thumb_up_off_alt12

chat_bubble_outline0

repeat1

shareShare

Jonathan Birch

@birchlse

22 days ago

Computer scientists often seem incredibly confident one way or the other about computational functionalism. What they should say is that the arguments both for and against provide only inconclusive considerations and the right attitude is therefore one of great uncertainty.

thumb_up_off_alt204

chat_bubble_outline38

repeat26

shareShare