Arthur Douillard (@ar_douillard) Twitter Tweets • TwiCopy

Arthur Douillard

@ar_douillard

+ Follow

distributed (diloco) + modularity (dipaco) + LLM @ DeepMind, continual learning PhD @ Sorbonne

ID: 4707240328

linkhttps://arthurdouillard.com calendar_today04-01-2016 18:19:47

3,3K Tweet

3,3K Followers

1,1K Following

samsja

@samsja19

a month ago

OpenDiloco update: I think that we hit the sweet spot with our last experiments. We managed to match the loss of the baseline with 200x less communication. The key was to trade the amount of inner step with more quantization on the pseudo gradient. We are preparing a

thumb_up_off_alt264

chat_bubble_outline18

repeat38

shareShare

Arthur Douillard

@ar_douillard

a month ago

The hottest open-source decentralized model in town

thumb_up_off_alt18

chat_bubble_outline0

repeat3

shareShare

Arthur Douillard

@ar_douillard

25 days ago

Diloco was published the 14 Nov 2023, and on the 17 Nov, 3 days after, Ilya saw something. Coincidence? 🤔

thumb_up_off_alt11

chat_bubble_outline0

repeat0

shareShare

Vincent Weisser

@vincentweisser

25 days ago

Ilya seems excited about distributed training 👀 github.com/PrimeIntellect…

thumb_up_off_alt45

chat_bubble_outline2

repeat4

shareShare

Stephen Roller

@stephenroller

25 days ago

(((ل()(ل() 'yoav))))👾 Anna Rogers

<a href="/yoavgo/">(((ل()(ل() 'yoav))))👾</a> <a href="/annargrs/">Anna Rogers</a>

thumb_up_off_alt23

chat_bubble_outline3

repeat3

shareShare

Logan Kilpatrick

@officiallogank

25 days ago

Today, we are rolling out three experimental models: - A new smaller variant, Gemini 1.5 Flash-8B - A stronger Gemini 1.5 Pro model (better on coding & complex prompts) - A significantly improved Gemini 1.5 Flash model Try them on aistudio.google.com, details in 🧵

thumb_up_off_alt2,2K

chat_bubble_outline219

repeat472

shareShare

lmsys.org

@lmsysorg

25 days ago

Chatbot Arena update⚡! The latest Gemini (Pro/Flash/Flash-9b) results are now live, with over 20K community votes! Highlights: - New Gemini-1.5-Flash (0827) makes a huge leap, climbing from #23 to #6 overall! - New Gemini-1.5-Pro (0827) shows strong gains in coding, math over

thumb_up_off_alt794

chat_bubble_outline36

repeat153

shareShare

Arthur Douillard

@ar_douillard

25 days ago

Recall when people were talking about trillions-of-params-OG-gpt4? This model was tbh insane. Now, on human preference thru lmsys, a small and ultra fast small Gemini-flash model would beat it. That's acceleration.

thumb_up_off_alt26

chat_bubble_outline0

repeat2

shareShare

Soham De

@sohamde_

23 days ago

Two months back, we released a 9B RecurrentGemma model, one of the strongest SSM-based language models out there, trained on 2T tokens! I finally updated arXiv with some of our results: arxiv.org/abs/2404.07839 Link to weights and code for our models in thread!