Nick Alonso (@nick__alonso) 's Twitter Profile
Nick Alonso

@nick__alonso

AI & machine learning, neuroscience, philosophy.

ID: 1560823930851663873

linkhttp://neuralnetnick.com calendar_today20-08-2022 03:00:51

87 Tweet

91 Followers

166 Following

Zyphra (@zyphraai) 's Twitter Profile Photo

Zyphra is releasing our first reasoning model, ZR1-1.5B. This small but powerful reasoning model excels at both math and code, making it one of the best models in these categories for its size. It also uses 60% less reasoning tokens than comparable models. 🆓Apache 2.0 license.

Zyphra is releasing our first reasoning model, ZR1-1.5B. This small but powerful reasoning model excels at both math and code, making it one of the best models in these categories for its size. It also uses 60% less reasoning tokens than comparable models.

🆓Apache 2.0 license.
Nick Alonso (@nick__alonso) 's Twitter Profile Photo

Learning in real time, during deployment, i.e. doing online-continual learning, effectively is important for many applications. It's also associated with theories of intelligence that emphasize learning efficiency, and is an ability where the gap between animals and AI is large.

rishi (@overquantized) 's Twitter Profile Photo

reach out if you want to work with me and others on novel architectures for pretraining! dms are open jobs.ashbyhq.com/zyphra/e509d43…

TensorWave (@tensorwavecloud) 's Twitter Profile Photo

It’s not just about GPUs. It’s about the ecosystem. Quentin Anthony joined Jeff Tatarchuk on the Beyond CUDA podcast to share how moving to AMD MI300X cut training costs at Zyphra 📺 Watch the full episode on YouTube (link in comments)

Zyphra (@zyphraai) 's Twitter Profile Photo

Zyphra is excited to release Compressed Convolutional Attention (CCA), a novel attention mechanism that: - Beats MHA, GQA, MLA for dense and MoE models - Reduces training/prefill flops - 3x fewer parameters vs MHA - Matches GQA/MLA KV-cache sizes without quality penalty

<a href="/ZyphraAI/">Zyphra</a> is excited to release Compressed Convolutional Attention (CCA), a novel attention mechanism that:
- Beats MHA, GQA, MLA for dense and MoE models
- Reduces training/prefill flops
- 3x fewer parameters vs MHA
- Matches GQA/MLA KV-cache sizes without quality penalty
Quentin Anthony (@quentinanthon15) 's Twitter Profile Photo

At this point in attention-free architectures, so many people have poisoned the well that it's just a well of poison. A "Transformer Killerâ„¢" drops once a month, and then the authors come back and "kill" transformers again like 5 months later. Love the work, I'm knee-deep in a

At this point in attention-free architectures, so many people have poisoned the well that it's just a well of poison. A "Transformer Killerâ„¢" drops once a month, and then the authors come back and "kill" transformers again like 5 months later.

Love the work, I'm knee-deep in a
Zyphra (@zyphraai) 's Twitter Profile Photo

Today Zyphra releases OVQ-attention, an advancement for efficient long-context processing! Existing LLM layers compress input too much, leading to poor long-context understanding, or too little, leading to expensive memory+compute. OVQ-attention is an alternative path. 🧵

Today <a href="/ZyphraAI/">Zyphra</a> releases OVQ-attention, an advancement for efficient long-context processing!

Existing LLM layers compress input too much, leading to poor long-context understanding, or too little, leading to expensive memory+compute.

OVQ-attention is an alternative path. 🧵
Zyphra (@zyphraai) 's Twitter Profile Photo

Zyphra releases research on a new way to build hybrid models. We introduce a new architecture leveraging the complementary strengths of Transformers and RNNs for greater flexibility and performance than existing approaches. We call it Hybrid Associative Memory (HAM). 🧵

<a href="/ZyphraAI/">Zyphra</a> releases research on a new way to build hybrid models. We introduce a new architecture leveraging the complementary strengths of Transformers and RNNs for greater flexibility and performance than existing approaches.

We call it Hybrid Associative Memory (HAM). 🧵
Jonathan Birch (@birchlse) 's Twitter Profile Photo

Computer scientists often seem incredibly confident one way or the other about computational functionalism. What they should say is that the arguments both for and against provide only inconclusive considerations and the right attitude is therefore one of great uncertainty.