David J Wu (@lightvector1) Twitter Tweets • TwiCopy

David J Wu

4 years ago

We have a new paper out! It is well-known that in many games the raw policy of an SL model can blunder in silly ways even after extensive training. Search seems to capture a component of human planning that deep neural nets have difficulty fitting or modeling on their own.

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare

David J Wu

@lightvector1

4 years ago

We know that search can be a powerful RL policy improvement method, (e.g. search outperforms the raw policy by 2000 Elo in AlphaGoZero!). One challenge is how to get this kind of RL to be robust when also needing to remain compatible with humans or other agents. Our work on how:

thumb_up_off_alt23

chat_bubble_outline0

repeat2

shareShare

317070

@317070

3 years ago

Did you know, that you can build a virtual machine inside ChatGPT? And that you can use this machine to create files, program and even browse the internet? engraved.blog/building-a-vir…

thumb_up_off_alt7,7K

chat_bubble_outline226

repeat2,2K

shareShare

Lex Fridman

@lexfridman

3 years ago

Here's my conversation with Noam Brown (Noam Brown), co-creator of AI systems that achieve superhuman level performance in games of poker and Diplomacy that involves strategic negotiations with humans. This was a fascinating, technical conversation. youtube.com/watch?v=2oHH4a…

Here's my conversation with Noam Brown (<a href="/polynoamial/">Noam Brown</a>), co-creator of AI systems that achieve superhuman level performance in games of poker and Diplomacy that involves strategic negotiations with humans. This was a fascinating, technical conversation. youtube.com/watch?v=2oHH4a…

thumb_up_off_alt1,1K

chat_bubble_outline64

repeat119

shareShare

Eugene Vinitsky 🍒🦋

@eugenevinitsky

3 years ago

What is off-belief learning and how does it help us build agents that coordinate only in grounded ways ? Part 1 of a new blog series on intuitive summaries of key ideas in multi-agent RL: eugenevinitsky.github.io/posts/Off-Beli…

thumb_up_off_alt67

chat_bubble_outline2

repeat18

shareShare

Samuel Sokota

@ssokota

3 years ago

There are two shapes below: one is named “kiki” and one is named “bouba”. Which is which? This is the puzzle we consider in our ICML paper: Learning Intuitive Policies Using Action Features. 1/N arxiv.org/abs/2201.12658 ⚫ ✴

thumb_up_off_alt41

chat_bubble_outline4

repeat10

shareShare

Leela Chess Zero

@leelachesszero

2 years ago

In the recent paper arxiv.org/abs/2402.04494 Google DeepMind introduced a transformer chess network, but didn't include Lc0 in their comparison. We've used transformers for a while, and our network is stronger with fewer parameters. More details soon.

In the recent paper arxiv.org/abs/2402.04494 <a href="/GoogleDeepMind/">Google DeepMind</a> introduced a transformer chess network, but didn't include Lc0 in their comparison. We've used transformers for a while, and our network is stronger with fewer parameters. More details soon.

thumb_up_off_alt91

chat_bubble_outline3

repeat17

shareShare

Leela Chess Zero

@leelachesszero

2 years ago

Google DeepMind ..and the blog post with more details is live at lczero.org/blog/2024/02/h…

thumb_up_off_alt10

chat_bubble_outline0

repeat3

shareShare

David J Wu

@lightvector1

2 years ago

There are tons of articles on MCTS, which wastes compute whenever paths lead to the same state, but few on Monte-Carlo *Graph* Search, which doesn't. But implementing MCGS soundly can be tricky! Here's a doc on how to do it, and the theory behind it: github.com/lightvector/Ka…

thumb_up_off_alt120

chat_bubble_outline3

repeat19

shareShare

Samuel Sokota

@ssokota

2 years ago

SOTA AI for games like poker & Hanabi rely on search methods that don’t scale to games w/ large amounts of hidden information. In our ICLR paper, we introduce simple search methods that scale to large games & get SOTA for Hanabi w/ 100x less compute. 1/N arxiv.org/abs/2304.13138

thumb_up_off_alt329

chat_bubble_outline5

repeat52

shareShare

David J Wu

@lightvector1

2 years ago

Even though we've known from word2vec and much work since that LLM representations correlate well with human concepts (both in linear additivity, distance/clustering, etc), I still find it cool that it holds up with larger models so far. Lots of space to explore further.

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

Thomas Ahle

@thomasahle

2 years ago

I always found the tensor notation in Fast Matrix Multiplication algorithms confusing. But using tensor diagrams it's pretty easy to see what's going on:

thumb_up_off_alt759

chat_bubble_outline8

repeat90

shareShare

David J Wu

@lightvector1

2 years ago

Wooo, tensor diagrams are cool. (Transformer self-attention layer, from greaterwrong.com/posts/BQKKQiBm…)

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare