Yunhao (Robin) Tang (@robinphysics) Twitter Tweets • TwiCopy

Yunhao (Robin) Tang

@robinphysics

+ Follow

Interested in RL. Now post-training Llama @AIatMeta. Prev post-training Gemini and RL research @Deepmind, PhD @Columbia

ID: 1059553685691342848

linkhttps://robintyh1.github.io/ calendar_today05-11-2018 21:10:59

121 Tweet

1,1K Takipçi

700 Takip Edilen

Marc G. Bellemare

@marcgbellemare

3 years ago

Ok, folks: two papers this afternoon you shouldn't miss: Reincarnating Reinforcement Learning w/ Rishabh Agarwal, Max Schwarzer (#607). The Nature of Distributional TD Errors w/ Yunhao (Robin) Tang, Remi Munos (#531). 1/2

thumb_up_off_alt43

chat_bubble_outline1

repeat12

shareShare

Yunhao (Robin) Tang

@robinphysics

3 years ago

BYOL-Explore: Exploration by Bootstrapped Prediction. Check out a unified objective for representation learning and exploration for RL. Today 4pm #NeurIPS2022 Hall J #911

thumb_up_off_alt2

chat_bubble_outline1

repeat1

shareShare

Yunhao (Robin) Tang

@robinphysics

3 years ago

The nature of TD error in distributional RL. Check out fundamental differences between distributional TD vs. classic TD errors. Today 4pm #NeurIPS2022 Hall J #531 Say hi to Remi and Marc while you are there!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Marc G. Bellemare

@marcgbellemare

3 years ago

Beautiful piece of work with @markrowland_ai, Will Dabney et al. After 5 years, we have a proof that quantile TD learning converges! QTD works incredibly well but had defied analysis because its updates don't correspond to some contractive operator. See arxiv.org/abs/2301.04462

thumb_up_off_alt123

chat_bubble_outline2

repeat23

shareShare

Pierre Richemond 🇪🇺

@theonekloud

2 years ago

NEW WORK-The last word on why BYOL works ? In ‘The Edge of Orthogonality’ we give exceedingly simple theory thanks to the optimal predictor being an orthogonal projection, connect BYOL to Riemannian SGD, and propose 4 new closed-form predictors ! #ICML2023 arxiv.org/abs/2302.04817

thumb_up_off_alt14

chat_bubble_outline1

repeat5

shareShare

Yunhao (Robin) Tang

@robinphysics

2 years ago

Interested in how non-contrastive representation learning works in RL? We show (1) Why representations do not collapses (2) How it relates to gradient PCA / SVD of transition matrix Understanding Self-Predictive Learning for RL #ICML2023 Google DeepMind arxiv.org/pdf/2212.03319

thumb_up_off_alt161

chat_bubble_outline1

repeat56

shareShare

Will Dabney

@wwdabney

2 years ago

Even if all you want is a value function, using quantile TD (QTD) can give a better estimate than standard TD. Today at #ICML2023, Mark Rowland presents our latest work on distributional RL in collaboration with Yunhao (Robin) Tang, Clare Lyle, Remi Munos, Marc G. Bellemare #809 @ 2pm

thumb_up_off_alt31

chat_bubble_outline1

repeat3

shareShare

Yunhao (Robin) Tang

@robinphysics

2 years ago

Interested in how **non-contrastive representation learning for RL** is magically equivalent to **gradient-based PCA/SVD on the transition matrix** and hence won't collapse and capture spectral info about the transition? Come talk to us at #ICML2023 Hall 1 #308 at 1:30pm

thumb_up_off_alt50

chat_bubble_outline0

repeat4

shareShare

Michal Valko

@misovalko

2 years ago

Fast-forward ⏩ alignment research from Google DeepMind ! Our latest results enhance alignment outcomes in Large Language Models (LLMs). Presenting NashLLM!

Fast-forward ⏩ alignment research from <a href="/GoogleDeepMind/">Google DeepMind</a> ! Our latest results enhance alignment outcomes in Large Language Models (LLMs). Presenting NashLLM!

thumb_up_off_alt808

chat_bubble_outline4

repeat129

shareShare

Yunhao (Robin) Tang

@robinphysics

a year ago

Thanks AK for promoting our work! Unlike regular RL where golden r(s,a) are available and online is generally deemed better than offline, in RLHF this is less clear. Complementary to some concurrent work, we investigate causes to the perf gap between online vs. offline.

thumb_up_off_alt15

chat_bubble_outline0

repeat4

shareShare

Zac Kenton

@zackenton1

a year ago

Eventually, humans will need to supervise superhuman AI - but how? Can we study it now? We don't have superhuman AI, but we do have LLMs. We study protocols where a weaker LLM uses stronger ones to find better answers than it knows itself. Does this work? It’s complicated: 🧵👇

thumb_up_off_alt245

chat_bubble_outline5

repeat63

shareShare