Arthur Douillard (@ar_douillard) Twitter Tweets • TwiCopy

Arthur Douillard

@ar_douillard

+ Follow

distributed learning @ deepmind | DiLoCo, DiPaCo | world-wide compute arbitrage

ID: 4707240328

linkhttps://arthurdouillard.com/ calendar_today04-01-2016 18:19:47

4,4K Tweet

6,6K Takipçi

1,1K Takip Edilen

Rohan

@rohan_virani

5 months ago

(4) Compute efficiency breakthroughs at training and inference. Stanford's work on benchmarks for CUDA kernel writing and "Cartridges" for 38x less memory consumption. DeepMind's DiLoCo enables distributed training with 500x less communication overhead - critical as models scale.

thumb_up_off_alt4

chat_bubble_outline1

repeat1

shareShare

samsja

@samsja19

5 months ago

I will be presenting our work on decantralized training, looking forward to all the other great talks!

thumb_up_off_alt71

chat_bubble_outline2

repeat3

shareShare

Tim Rocktäschel

@_rockt

5 months ago

Great opportunity at Google Research for folks interested in AutoML, evolutionary methods, meta-learning, and open-endedness: London: google.com/about/careers/… Zürich: google.com/about/careers/…

thumb_up_off_alt174

chat_bubble_outline2

repeat22

shareShare

wh

@nrehiew_

5 months ago

> AI Influencers say OpenAI has a "universal verifier" > ask if its just LLM as a judge > they don't understand > pull out Dec 2023 paper about LLM as a Judge > they laugh and say "its a universal verifier sir" > buy subscription to read article > it's LLM as a Judge

thumb_up_off_alt815

chat_bubble_outline34

repeat45

shareShare

Mustafa Shukor

@mustafashukor1

5 months ago

Our work on scaling laws for multimodal models and MoEs got an Oral at ICCV. Check it out !

thumb_up_off_alt141

chat_bubble_outline2

repeat21

shareShare

Google DeepMind

@googledeepmind

5 months ago

What if you could not only watch a generated video, but explore it too? 🌐 Genie 3 is our groundbreaking world model that creates interactive, playable environments from a single text prompt. From photorealistic landscapes to fantasy realms, the possibilities are endless. 🧵

thumb_up_off_alt10,10K

chat_bubble_outline692

repeat2,2K

shareShare

Arthur Douillard

@ar_douillard

5 months ago

Genie3, Opus 4.1, and GPT-OSS released simultaneously. Feeling the acceleration, anon?

thumb_up_off_alt35

chat_bubble_outline1

repeat1

shareShare

samsja

@samsja19

5 months ago

we live in the best era, it's just insane

thumb_up_off_alt50

chat_bubble_outline2

repeat3

shareShare

Mark Kretschmann

@mark_k

5 months ago

HOLY SHIT. Recursion reached with Google Genie 3 The Matrix is becoming real... Watch this video until the end! 👀

thumb_up_off_alt2,2K

chat_bubble_outline79

repeat160

shareShare

Zach Mueller

@thezachmueller

5 months ago

DiLoCo is a distributed-optimization method for training LLMs across slow or geographically separated networks. Each worker runs many local AdamW steps on its own data; only every ~500 steps do the workers send compact “pseudo-gradients” to a global Nesterov-momentum optimizer,

thumb_up_off_alt8

chat_bubble_outline1

repeat1

shareShare

Demis Hassabis

@demishassabis

5 months ago

One word: relentless. just in the past two weeks, we’ve shipped: 🌐 Genie 3 - the most advanced world simulator ever 🤔 Gemini 2.5 Pro Deep Think available to Ultra subs 🎓 Gemini Pro free for uni students & $1B for US ed 🌍 AlphaEarth - a geospatial model of the entire planet

thumb_up_off_alt9,9K

chat_bubble_outline491

repeat1,1K

shareShare

Arthur Douillard

@ar_douillard

5 months ago

AGI won’t exist without the best infra possible. I strongly recommend to apply to PI

thumb_up_off_alt104

chat_bubble_outline5

repeat9

shareShare

Arthur Douillard

@ar_douillard

5 months ago

The problem with "understanding" in deep learning, is that it actually rarely hold in practice, as it requires way too much simplifying assumptions. I'd rather have a SotA model with some intuitions why it works than a toy model with perfect understanding 🤷

thumb_up_off_alt47

chat_bubble_outline2

repeat1

shareShare

Arthur Douillard

@ar_douillard

5 months ago

Imagine the rhythmic noise of an entire datacenter performing forward-backward & AR across thousands GPUs. A sight to behold, the datacenter is breathing!

thumb_up_off_alt15

chat_bubble_outline4

repeat0

shareShare

Arthur Douillard

@ar_douillard

5 months ago

Absolute cinema

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare