Arthur Douillard (@ar_douillard) 's Twitter Profile
Arthur Douillard

@ar_douillard

distributed learning @ deepmind | DiLoCo, DiPaCo | world-wide compute arbitrage

ID: 4707240328

linkhttps://arthurdouillard.com/ calendar_today04-01-2016 18:19:47

4,4K Tweet

6,6K Takipçi

1,1K Takip Edilen

Rohan (@rohan_virani) 's Twitter Profile Photo

(4) Compute efficiency breakthroughs at training and inference. Stanford's work on benchmarks for CUDA kernel writing and "Cartridges" for 38x less memory consumption. DeepMind's DiLoCo enables distributed training with 500x less communication overhead - critical as models scale.

Tim Rocktäschel (@_rockt) 's Twitter Profile Photo

Great opportunity at Google Research for folks interested in AutoML, evolutionary methods, meta-learning, and open-endedness:  London: google.com/about/careers/… Zürich: google.com/about/careers/…

wh (@nrehiew_) 's Twitter Profile Photo

> AI Influencers say OpenAI has a "universal verifier" > ask if its just LLM as a judge > they don't understand > pull out Dec 2023 paper about LLM as a Judge > they laugh and say "its a universal verifier sir" > buy subscription to read article > it's LLM as a Judge

> AI Influencers say OpenAI has a "universal verifier"
> ask if its just LLM as a judge 
> they don't understand
> pull out Dec 2023 paper about LLM as a Judge
> they laugh and say "its a universal verifier sir"
> buy subscription to read article
> it's LLM as a Judge
Google DeepMind (@googledeepmind) 's Twitter Profile Photo

What if you could not only watch a generated video, but explore it too? 🌐 Genie 3 is our groundbreaking world model that creates interactive, playable environments from a single text prompt. From photorealistic landscapes to fantasy realms, the possibilities are endless. 🧵

Zach Mueller (@thezachmueller) 's Twitter Profile Photo

DiLoCo is a distributed-optimization method for training LLMs across slow or geographically separated networks. Each worker runs many local AdamW steps on its own data; only every ~500 steps do the workers send compact “pseudo-gradients” to a global Nesterov-momentum optimizer,

Demis Hassabis (@demishassabis) 's Twitter Profile Photo

One word: relentless. just in the past two weeks, we’ve shipped: 🌐 Genie 3 - the most advanced world simulator ever 🤔 Gemini 2.5 Pro Deep Think available to Ultra subs 🎓 Gemini Pro free for uni students & $1B for US ed 🌍 AlphaEarth - a geospatial model of the entire planet

Arthur Douillard (@ar_douillard) 's Twitter Profile Photo

The problem with "understanding" in deep learning, is that it actually rarely hold in practice, as it requires way too much simplifying assumptions. I'd rather have a SotA model with some intuitions why it works than a toy model with perfect understanding 🤷

Arthur Douillard (@ar_douillard) 's Twitter Profile Photo

Imagine the rhythmic noise of an entire datacenter performing forward-backward & AR across thousands GPUs. A sight to behold, the datacenter is breathing!