Patrick Yin (@patrickhyin) Twitter Tweets • TwiCopy

Patrick Yin

@patrickhyin

+ Follow

phd @uwcse, undergrad @berkeleyai

ID: 1664673670051270657

linkhttp://patrickyin.me calendar_today02-06-2023 16:41:55

16 Tweet

101 Followers

141 Following

Chongyi Zheng

@chongyiz1

3 years ago

1/6 Self-supervised learning is wildly successful for CV, NLP; might goal-conditioned RL enable self-supervised RL? In months of experiments, we found design decisions that significantly boost success, acquire intriguing representations, and solve real robotic tasks from images.

thumb_up_off_alt61

chat_bubble_outline1

repeat13

shareShare

Sergey Levine

@svlevine

3 years ago

Contrastive RL provides a way to use contrastive learning methods to learn general-purpose goal-conditioned policies, uniting representation learning and RL. We recently got this working at scale with real robots! You can read more here: chongyi-zheng.github.io/stable_contras… A short 🧵👇

thumb_up_off_alt204

chat_bubble_outline2

repeat40

shareShare

Abhishek Gupta

@abhishekunique7

2 years ago

So you want to do robotics tasks requiring dynamics information in the real world, but you don’t want the pain of real-world RL? In our work to be presented as an oral at ICLR 2024, Marius Memmel showed how we can do this via a real-to-sim-to-real policy learning approach. A 🧵 (1/7)

thumb_up_off_alt137

chat_bubble_outline1

repeat25

shareShare

Chuning Zhu

@chuning_zhu

a year ago

How can we train RL agents that transfer to any reward? In our NeurIPS Conference paper DiSPO, we propose to learn the distribution of successor features of a stationary dataset, which enables zero-shot transfer to arbitrary rewards without additional training! A thread 🧵(1/9)

thumb_up_off_alt91

chat_bubble_outline1

repeat32

shareShare

Abhishek Gupta

@abhishekunique7

a year ago

How can we enable transferable decision-making for *any* reward zero-shot? MBRL is task-agnostic but suffers from compounding error, while MFRL is task-specific. We propose a new class of world models that transfers across tasks zero-shot and avoids compounding error! A 🧵 (1/9)

thumb_up_off_alt54

chat_bubble_outline1

repeat11

shareShare

Abhishek Gupta

@abhishekunique7

a year ago

In my experience, robot 'generalists' are often jacks of all trades but masters of none. In training across multiple tasks and environments, robot policies fail to generalize robustly and effectively to each particular test setting. What if at test time, we non-parametrically

thumb_up_off_alt116

chat_bubble_outline2

repeat21

shareShare

Marius Memmel

@memmelma

a year ago

Have some offline data lying around? Use it to robustify few-shot imitation learning! 🤖 STRAP 🎒 is a retrieval-based method that leverages semantic sub-trajectories in offline datasets to augment the training data. 🧵 1/6

thumb_up_off_alt72

chat_bubble_outline3

repeat25

shareShare

Abhishek Gupta

@abhishekunique7

a year ago

So we did a bunch of projects with real world reinforcement learning - but it was often too inefficient to be practical to train tabula rasa. This suggests we need better priors, but acquiring these from on-robot data can often be expensive as well. In our recent work, we show

thumb_up_off_alt189

chat_bubble_outline3

repeat30

shareShare

Chuning Zhu

@chuning_zhu

a year ago

Scaling imitation learning has been bottlenecked by the need for high-quality robot data, which are expensive to collect. But are we utilizing existing data to the fullest extent? A thread (1/11)

thumb_up_off_alt250

chat_bubble_outline13

repeat39

shareShare