Arthur Douillard (@ar_douillard) 's Twitter Profile
Arthur Douillard

@ar_douillard

distributed (diloco) + modularity (dipaco) + LLM @ DeepMind, continual learning PhD @ Sorbonne

ID: 4707240328

linkhttps://arthurdouillard.com calendar_today04-01-2016 18:19:47

3,3K Tweet

3,3K Followers

1,1K Following

samsja (@samsja19) 's Twitter Profile Photo

OpenDiloco update: I think that we hit the sweet spot with our last experiments. We managed to match the loss of the baseline with 200x less communication. The key was to trade the amount of inner step with more quantization on the pseudo gradient. We are preparing a

Logan Kilpatrick (@officiallogank) 's Twitter Profile Photo

Today, we are rolling out three experimental models: - A new smaller variant, Gemini 1.5 Flash-8B - A stronger Gemini 1.5 Pro model (better on coding & complex prompts) - A significantly improved Gemini 1.5 Flash model Try them on aistudio.google.com, details in 🧵

lmsys.org (@lmsysorg) 's Twitter Profile Photo

Chatbot Arena update⚡! The latest Gemini (Pro/Flash/Flash-9b) results are now live, with over 20K community votes! Highlights: - New Gemini-1.5-Flash (0827) makes a huge leap, climbing from #23 to #6 overall! - New Gemini-1.5-Pro (0827) shows strong gains in coding, math over

Chatbot Arena update⚡!

The latest Gemini (Pro/Flash/Flash-9b) results are now live, with over 20K community votes!

Highlights:
- New Gemini-1.5-Flash (0827) makes a huge leap, climbing from #23 to #6 overall!
- New Gemini-1.5-Pro (0827) shows strong gains in coding, math over
Arthur Douillard (@ar_douillard) 's Twitter Profile Photo

Recall when people were talking about trillions-of-params-OG-gpt4? This model was tbh insane. Now, on human preference thru lmsys, a small and ultra fast small Gemini-flash model would beat it. That's acceleration.

Soham De (@sohamde_) 's Twitter Profile Photo

Two months back, we released a 9B RecurrentGemma model, one of the strongest SSM-based language models out there, trained on 2T tokens! I finally updated arXiv with some of our results: arxiv.org/abs/2404.07839 Link to weights and code for our models in thread!

Two months back, we released a 9B RecurrentGemma model, one of the strongest SSM-based language models out there, trained on 2T tokens! 

I finally updated arXiv with some of our results: arxiv.org/abs/2404.07839

Link to weights and code for our models in thread!