
Yunhao (Robin) Tang
@robinphysics
Interested in RL. Now post-training Llama @AIatMeta. Prev post-training Gemini and RL research @Deepmind, PhD @Columbia
ID: 1059553685691342848
https://robintyh1.github.io/ 05-11-2018 21:10:59
121 Tweet
1,1K Takipçi
700 Takip Edilen

Ok, folks: two papers this afternoon you shouldn't miss: Reincarnating Reinforcement Learning w/ Rishabh Agarwal, Max Schwarzer (#607). The Nature of Distributional TD Errors w/ Yunhao (Robin) Tang, Remi Munos (#531). 1/2



Beautiful piece of work with @markrowland_ai, Will Dabney et al. After 5 years, we have a proof that quantile TD learning converges! QTD works incredibly well but had defied analysis because its updates don't correspond to some contractive operator. See arxiv.org/abs/2301.04462


Interested in how non-contrastive representation learning works in RL? We show (1) Why representations do not collapses (2) How it relates to gradient PCA / SVD of transition matrix Understanding Self-Predictive Learning for RL #ICML2023 Google DeepMind arxiv.org/pdf/2212.03319


Even if all you want is a value function, using quantile TD (QTD) can give a better estimate than standard TD. Today at #ICML2023, Mark Rowland presents our latest work on distributional RL in collaboration with Yunhao (Robin) Tang, Clare Lyle, Remi Munos, Marc G. Bellemare #809 @ 2pm


Fast-forward ⏩ alignment research from Google DeepMind ! Our latest results enhance alignment outcomes in Large Language Models (LLMs). Presenting NashLLM!


