Greg Farquhar (@greg_far) Twitter Tweets • TwiCopy

Maximilian Igl

8 years ago

I am very excited to share our ICML paper “Deep Variational Reinforcement Learning (DVRL) for POMDPs”: Our agent learns a model of the environment and acts based on its belief state in this model. w/ @zinmalu Tuan Anh Le Frank Wood Shimon Whiteson arxiv.org/abs/1806.02426

thumb_up_off_alt122

chat_bubble_outline0

repeat34

shareShare

Tim Rocktäschel

@_rockt

7 years ago

I had the pleasure to co-supervise outstanding MSc students jointly with Jakob Foerster (Jakob Foerster) and Greg Farquhar (Greg Farquhar) at Oxford Comp Sci this year. Together, we compiled our advice for embarking on short-term machine learning research projects: rockt.github.io/2018/08/29/msc…

thumb_up_off_alt269

chat_bubble_outline3

repeat88

shareShare

Tim Rocktäschel

@_rockt

7 years ago

How can RL agents exploit the compositional, relational and hierarchical structure of the world? A growing number of authors propose learning from natural language. We are excited to share our IJCAIconf survey of this emerging field! arxiv.org/abs/1906.03926 TL;DR:🤖+📖=📈🎯🏆🥳

thumb_up_off_alt252

chat_bubble_outline2

repeat72

shareShare

Greg Farquhar

@greg_far

7 years ago

Progressively growing the action space creates a great curriculum for learning agents -- check out our paper: arxiv.org/abs/1906.12266 + code: github.com/TorchCraft/Tor…. Great working with Laura Gustafson Zeming Lin Shimon Whiteson Nicolas Usunier Gabriel Synnaeve

thumb_up_off_alt130

chat_bubble_outline0

repeat32

shareShare

Greg Farquhar

@greg_far

6 years ago

AI accelerates by 10x in the hour it takes to repost from r/machinelearning to r/singularityisnear... just how near is it at that rate?? 😱

thumb_up_off_alt13

chat_bubble_outline1

repeat1

shareShare

Noam Brown

@polynoamial

6 years ago

Tuomas Sandholm and I are doing a Reddit AMA now on the #Pluribus poker AI! reddit.com/r/MachineLearn…

thumb_up_off_alt25

chat_bubble_outline0

repeat4

shareShare

Greg Farquhar

@greg_far

6 years ago

I particularly enjoyed visualising & analysing the learned mixing functions that combine per-agent utilities into joint values!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Tim Rocktäschel

@_rockt

6 years ago

I am proud to announce the release of the NetHack Learning Environment (NLE)! NetHack is an extremely difficult procedurally-generated grid-world dungeon-crawl game that strikes a great balance between complexity and speed for single-agent reinforcement learning research. 1/

thumb_up_off_alt704

chat_bubble_outline14

repeat188

shareShare

Greg Farquhar

@greg_far

6 years ago

This is awesome, but I'm a little scared of how much time I might spend playing it myself...

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

Greg Farquhar

@greg_far

6 years ago

Permanent damage to generalisation from early updates in non-stationary training -- really enjoyed looking into this intriguing problem and trying to solve it for deep RL agents!

thumb_up_off_alt15

chat_bubble_outline0

repeat2

shareShare

Jakob Foerster

@j_foerst

8 years ago

Excited to share "DiCE: The Infinitely Differentiable Monte Carlo Estimator": arxiv.org/abs/1802.05098 Try this one weird objective for correct any-order gradient estimators in all your stochastic graphs ;) With fantastic Oxford/CMU team: Greg Farquhar Maruan Al-Shedivat Tim Rocktäschel Shimon Whiteson

thumb_up_off_alt233

chat_bubble_outline3

repeat76

shareShare

Greg Farquhar

@greg_far

8 years ago

The camera-ready of our #ICLR2018 paper “TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning” is now online arxiv.org/abs/1710.11417. Code is available at github.com/oxwhirl/treeqn/ Tim Rocktäschel Maximilian Igl Shimon Whiteson WhiRL

thumb_up_off_alt56

chat_bubble_outline0

repeat17

shareShare

Shimon Whiteson

@shimon8282

8 years ago

Our latest paper: how to learn complex joint value functions for teams of agents whose greedy policies can be computed and executed in a decentralised fashion. The trick is a new monotonic value function factorisation. With results on StartCraft 2! arxiv.org/abs/1803.11485

thumb_up_off_alt98

chat_bubble_outline0

repeat32

shareShare