David J Wu (@lightvector1) 's Twitter Profile
David J Wu

@lightvector1

Researcher, game AI enthusiast, author of KataGo (katagotraining.org)

ID: 1317558974154158082

calendar_today17-10-2020 20:12:11

27 Tweet

503 Followers

59 Following

David J Wu (@lightvector1) 's Twitter Profile Photo

We have a new paper out! It is well-known that in many games the raw policy of an SL model can blunder in silly ways even after extensive training. Search seems to capture a component of human planning that deep neural nets have difficulty fitting or modeling on their own.

David J Wu (@lightvector1) 's Twitter Profile Photo

We know that search can be a powerful RL policy improvement method, (e.g. search outperforms the raw policy by 2000 Elo in AlphaGoZero!). One challenge is how to get this kind of RL to be robust when also needing to remain compatible with humans or other agents. Our work on how:

317070 (@317070) 's Twitter Profile Photo

Did you know, that you can build a virtual machine inside ChatGPT? And that you can use this machine to create files, program and even browse the internet? engraved.blog/building-a-vir…

Lex Fridman (@lexfridman) 's Twitter Profile Photo

Here's my conversation with Noam Brown (Noam Brown), co-creator of AI systems that achieve superhuman level performance in games of poker and Diplomacy that involves strategic negotiations with humans. This was a fascinating, technical conversation. youtube.com/watch?v=2oHH4a…

Here's my conversation with Noam Brown (<a href="/polynoamial/">Noam Brown</a>), co-creator of AI systems that achieve superhuman level performance in games of poker and Diplomacy that involves strategic negotiations with humans. This was a fascinating, technical conversation. youtube.com/watch?v=2oHH4a…
Eugene Vinitsky 🍒🦋 (@eugenevinitsky) 's Twitter Profile Photo

What is off-belief learning and how does it help us build agents that coordinate only in grounded ways ? Part 1 of a new blog series on intuitive summaries of key ideas in multi-agent RL: eugenevinitsky.github.io/posts/Off-Beli…

What is off-belief learning and how does it help us build agents that coordinate only in grounded ways ? Part 1 of a new blog series on intuitive summaries of key ideas in multi-agent RL: eugenevinitsky.github.io/posts/Off-Beli…
Samuel Sokota (@ssokota) 's Twitter Profile Photo

There are two shapes below: one is named “kiki” and one is named “bouba”. Which is which? This is the puzzle we consider in our ICML paper: Learning Intuitive Policies Using Action Features. 1/N arxiv.org/abs/2201.12658 ⚫ ✴

Leela Chess Zero (@leelachesszero) 's Twitter Profile Photo

In the recent paper arxiv.org/abs/2402.04494 Google DeepMind introduced a transformer chess network, but didn't include Lc0 in their comparison. We've used transformers for a while, and our network is stronger with fewer parameters. More details soon.

In the recent paper arxiv.org/abs/2402.04494 <a href="/GoogleDeepMind/">Google DeepMind</a> introduced a transformer chess network, but didn't include Lc0 in their comparison. We've used transformers for a while, and our network is stronger with fewer parameters. More details soon.
David J Wu (@lightvector1) 's Twitter Profile Photo

There are tons of articles on MCTS, which wastes compute whenever paths lead to the same state, but few on Monte-Carlo *Graph* Search, which doesn't. But implementing MCGS soundly can be tricky! Here's a doc on how to do it, and the theory behind it: github.com/lightvector/Ka…

There are tons of articles on MCTS, which wastes compute whenever paths lead to the same state, but few on Monte-Carlo *Graph* Search, which doesn't. But implementing MCGS soundly can be tricky! Here's a doc on how to do it, and the theory behind it: github.com/lightvector/Ka…
Samuel Sokota (@ssokota) 's Twitter Profile Photo

SOTA AI for games like poker & Hanabi rely on search methods that don’t scale to games w/ large amounts of hidden information. In our ICLR paper, we introduce simple search methods that scale to large games & get SOTA for Hanabi w/ 100x less compute. 1/N arxiv.org/abs/2304.13138

SOTA AI for games like poker &amp; Hanabi rely on search methods that don’t scale to games w/ large amounts of hidden information.

In our ICLR paper, we introduce simple search methods that scale to large games &amp; get SOTA for Hanabi w/ 100x less compute. 1/N

arxiv.org/abs/2304.13138
David J Wu (@lightvector1) 's Twitter Profile Photo

Even though we've known from word2vec and much work since that LLM representations correlate well with human concepts (both in linear additivity, distance/clustering, etc), I still find it cool that it holds up with larger models so far. Lots of space to explore further.

Thomas Ahle (@thomasahle) 's Twitter Profile Photo

I always found the tensor notation in Fast Matrix Multiplication algorithms confusing. But using tensor diagrams it's pretty easy to see what's going on:

I always found the tensor notation in Fast Matrix Multiplication algorithms confusing. But using tensor diagrams it's pretty easy to see what's going on: