Yunhao (Robin) Tang (@robinphysics) 's Twitter Profile
Yunhao (Robin) Tang

@robinphysics

Interested in RL. Now post-training Llama @AIatMeta. Prev post-training Gemini and RL research @Deepmind, PhD @Columbia

ID: 1059553685691342848

linkhttps://robintyh1.github.io/ calendar_today05-11-2018 21:10:59

121 Tweet

1,1K Takipçi

700 Takip Edilen

Marc G. Bellemare (@marcgbellemare) 's Twitter Profile Photo

Ok, folks: two papers this afternoon you shouldn't miss: Reincarnating Reinforcement Learning w/ Rishabh Agarwal, Max Schwarzer (#607). The Nature of Distributional TD Errors w/ Yunhao (Robin) Tang, Remi Munos (#531). 1/2

Yunhao (Robin) Tang (@robinphysics) 's Twitter Profile Photo

BYOL-Explore: Exploration by Bootstrapped Prediction. Check out a unified objective for representation learning and exploration for RL. Today 4pm #NeurIPS2022 Hall J #911

Yunhao (Robin) Tang (@robinphysics) 's Twitter Profile Photo

The nature of TD error in distributional RL. Check out fundamental differences between distributional TD vs. classic TD errors. Today 4pm #NeurIPS2022 Hall J #531 Say hi to Remi and Marc while you are there!

Marc G. Bellemare (@marcgbellemare) 's Twitter Profile Photo

Beautiful piece of work with @markrowland_ai, Will Dabney et al. After 5 years, we have a proof that quantile TD learning converges! QTD works incredibly well but had defied analysis because its updates don't correspond to some contractive operator. See arxiv.org/abs/2301.04462

Pierre Richemond 🇪🇺 (@theonekloud) 's Twitter Profile Photo

NEW WORK-The last word on why BYOL works ? In ‘The Edge of Orthogonality’ we give exceedingly simple theory thanks to the optimal predictor being an orthogonal projection, connect BYOL to Riemannian SGD, and propose 4 new closed-form predictors ! #ICML2023 arxiv.org/abs/2302.04817

NEW WORK-The last word on why BYOL works ? In ‘The Edge of Orthogonality’ we give exceedingly simple theory thanks to the optimal predictor being an orthogonal projection, connect BYOL to Riemannian SGD, and propose 4 new closed-form predictors ! #ICML2023 arxiv.org/abs/2302.04817
Yunhao (Robin) Tang (@robinphysics) 's Twitter Profile Photo

Interested in how non-contrastive representation learning works in RL? We show (1) Why representations do not collapses (2) How it relates to gradient PCA / SVD of transition matrix Understanding Self-Predictive Learning for RL #ICML2023 Google DeepMind arxiv.org/pdf/2212.03319

Interested in how non-contrastive representation learning works in RL? We show
(1) Why representations do not collapses
(2) How it relates to gradient PCA / SVD of transition matrix
Understanding Self-Predictive Learning for RL #ICML2023 <a href="/GoogleDeepMind/">Google DeepMind</a>  arxiv.org/pdf/2212.03319
Will Dabney (@wwdabney) 's Twitter Profile Photo

Even if all you want is a value function, using quantile TD (QTD) can give a better estimate than standard TD. Today at #ICML2023, Mark Rowland presents our latest work on distributional RL in collaboration with Yunhao (Robin) TangClare Lyle, Remi Munos, Marc G. Bellemare #809 @ 2pm

Yunhao (Robin) Tang (@robinphysics) 's Twitter Profile Photo

Interested in how **non-contrastive representation learning for RL** is magically equivalent to **gradient-based PCA/SVD on the transition matrix** and hence won't collapse and capture spectral info about the transition? Come talk to us at #ICML2023 Hall 1 #308 at 1:30pm

Michal Valko (@misovalko) 's Twitter Profile Photo

Fast-forward ⏩ alignment research from Google DeepMind ! Our latest results enhance alignment outcomes in Large Language Models (LLMs). Presenting NashLLM!

Fast-forward ⏩ alignment research from <a href="/GoogleDeepMind/">Google DeepMind</a> ! Our latest results enhance alignment outcomes in Large Language Models (LLMs). Presenting NashLLM!
Yunhao (Robin) Tang (@robinphysics) 's Twitter Profile Photo

Thanks AK for promoting our work! Unlike regular RL where golden r(s,a) are available and online is generally deemed better than offline, in RLHF this is less clear. Complementary to some concurrent work, we investigate causes to the perf gap between online vs. offline.

Zac Kenton (@zackenton1) 's Twitter Profile Photo

Eventually, humans will need to supervise superhuman AI - but how? Can we study it now? We don't have superhuman AI, but we do have LLMs. We study protocols where a weaker LLM uses stronger ones to find better answers than it knows itself. Does this work? It’s complicated: 🧵👇

Eventually, humans will need to supervise superhuman AI - but how? Can we study it now?

We don't have superhuman AI, but we do have LLMs. We study protocols where a weaker LLM uses stronger ones to find better answers than it knows itself.

Does this work? It’s complicated: 🧵👇