Felipe Cruz-Salinas (@fffffelipec) 's Twitter Profile
Felipe Cruz-Salinas

@fffffelipec

Pre-training @cohere

ID: 1731942561298669568

linkhttps://afcruzs.github.io/ calendar_today05-12-2023 07:44:30

127 Tweet

192 Followers

490 Following

Cohere Labs (@cohere_labs) 's Twitter Profile Photo

Introducing ✨ Aya Vision ✨ - an open-weights model to connect our world through language and vision Aya Vision adds breakthrough multimodal capabilities to our state-of-the-art multilingual 8B and 32B models. 🌿

Command A(idan) (@aidangomez) 's Twitter Profile Photo

Today cohere is very excited to introduce Command A, our new model succeeding Command R+. Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding usecases. 🧵

Today <a href="/cohere/">cohere</a> is very excited to introduce Command A, our new model succeeding Command R+. Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding usecases. 🧵
Felipe Cruz-Salinas (@fffffelipec) 's Twitter Profile Photo

This is what we've been hard at work for the last few months :) Command A is great at long context (256k easily), multilinguality, and throughput overall. Pre-training the base model and all the work leading up to that was super rewarding. I'm very happy it's out now 😌

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

🚀 Big news cohere's latest Command A now climbs to #13 on Arena! Another organization joining the top-15 club - congrats to the Cohere team! Highlights: - open-weight model (111B) - 256K context window - $2.5/$10 input/output MTok More analysis👇

🚀 Big news <a href="/cohere/">cohere</a>'s latest Command A now climbs to #13 on Arena!

Another organization joining the top-15 club - congrats to the Cohere team!

Highlights:
- open-weight model (111B)
- 256K context window
- $2.5/$10 input/output MTok

More analysis👇
cohere (@cohere) 's Twitter Profile Photo

We’re redefining what’s possible with AI. With the release of our latest model, Command A, optimized for real-world agentic and multilingual tasks, we’re demonstrating our commitment to bringing enterprises AI that goes beyond the ordinary, and offers security & efficiency.

Marzieh Fadaee (@mziizm) 's Twitter Profile Photo

1/ Science is only as strong as the benchmarks it relies on. So how fair—and scientifically rigorous—is today’s most widely used evaluation benchmark? We took a deep dive into Chatbot Arena to find out. 🧵

1/ Science is only as strong as the benchmarks it relies on.

So how fair—and scientifically rigorous—is today’s most widely used evaluation benchmark?

We took a deep dive into Chatbot Arena to find out. 🧵
Irem Ergün (@irombie) 's Twitter Profile Photo

I'm excited to share our new pre-print ShiQ: Bringing back Bellman to LLMs! arxiv.org/abs/2505.11081 In this work, we propose a new, Q-learning inspired RL algorithm for finetuning LLMs 🎉 (1/n)

Sander Land (@magikarp_tokens) 's Twitter Profile Photo

🔠 UTF-8 was never meant for language models. Yet every major tokenizer still uses it, creating unfair "byte premiums". Why should your native script cost more to tokenize? It's time for a change. 🧵👇

🔠 UTF-8 was never meant for language models.
Yet every major tokenizer still uses it, creating unfair "byte premiums".
Why should your native script cost more to tokenize? It's time for a change. 🧵👇
Cohere Labs (@cohere_labs) 's Twitter Profile Photo

How can we make language models more flexible to adapt to new languages after pretraining? 🌏 🧠 Our latest work investigates whether a tokenizer trained on more languages than the pretraining target can improve language plasticity without compromising pretraining performance.

How can we make language models more flexible to adapt to new languages after pretraining? 🌏

🧠 Our latest work investigates whether a tokenizer trained on more languages than the pretraining target can improve language plasticity without compromising pretraining performance.
Diana Abagyan (@dianaabagyan) 's Twitter Profile Photo

A huge thank you to all of my mentors and collaborators, especially Ahmet Üstün, Sara Hooker, Alejandro, and Marzieh Fadaee for their guidance and support ✨ 📜Check out our paper! arxiv.org/abs/2506.10766

Felipe Cruz-Salinas (@fffffelipec) 's Twitter Profile Photo

This is very cool. One of the reasons I think muP hasn't caught on is that it is not seamlessly integrated with torch. Optax can make some things annoying, but this one is nice :)