Stephen McAleer (@mcaleerstephen) 's Twitter Profile
Stephen McAleer

@mcaleerstephen

Researching agent safety at OpenAI

ID: 2724167859

linkhttps://www.andrew.cmu.edu/user/smcaleer/ calendar_today25-07-2014 14:26:05

699 Tweet

10,10K Takipçi

991 Takip Edilen

Stephen McAleer (@mcaleerstephen) 's Twitter Profile Photo

Policy Space Response Oracles (PSRO) mixes over a population of deep RL policies to approximate a Nash equilibrium, but exploitability can increase from one iteration to the next. We introduce Anytime PSRO which does not increase exploitability. Arxiv: arxiv.org/abs/2201.07700

Policy Space Response Oracles (PSRO) mixes over a population of deep RL policies to approximate a Nash equilibrium, but exploitability can increase from one iteration to the next. We introduce Anytime PSRO which does not increase exploitability.

Arxiv: arxiv.org/abs/2201.07700