Patrick Yin (@patrickhyin) 's Twitter Profile
Patrick Yin

@patrickhyin

phd @uwcse, undergrad @berkeleyai

ID: 1664673670051270657

linkhttp://patrickyin.me calendar_today02-06-2023 16:41:55

16 Tweet

101 Followers

141 Following

Chongyi Zheng (@chongyiz1) 's Twitter Profile Photo

1/6 Self-supervised learning is wildly successful for CV, NLP; might goal-conditioned RL enable self-supervised RL? In months of experiments, we found design decisions that significantly boost success, acquire intriguing representations, and solve real robotic tasks from images.

Sergey Levine (@svlevine) 's Twitter Profile Photo

Contrastive RL provides a way to use contrastive learning methods to learn general-purpose goal-conditioned policies, uniting representation learning and RL. We recently got this working at scale with real robots! You can read more here: chongyi-zheng.github.io/stable_contras… A short 🧵👇

Contrastive RL provides a way to use contrastive learning methods to learn general-purpose goal-conditioned policies, uniting representation learning and RL. We recently got this working at scale with real robots! You can read more here: chongyi-zheng.github.io/stable_contras…

A short 🧵👇
Abhishek Gupta (@abhishekunique7) 's Twitter Profile Photo

So you want to do robotics tasks requiring dynamics information in the real world, but you don’t want the pain of real-world RL? In our work to be presented as an oral at ICLR 2024, Marius Memmel showed how we can do this via a real-to-sim-to-real policy learning approach. A 🧵 (1/7)

Chuning Zhu (@chuning_zhu) 's Twitter Profile Photo

How can we train RL agents that transfer to any reward? In our NeurIPS Conference paper DiSPO, we propose to learn the distribution of successor features of a stationary dataset, which enables zero-shot transfer to arbitrary rewards without additional training! A thread 🧵(1/9)

Abhishek Gupta (@abhishekunique7) 's Twitter Profile Photo

How can we enable transferable decision-making for *any* reward zero-shot? MBRL is task-agnostic but suffers from compounding error, while MFRL is task-specific. We propose a new class of world models that transfers across tasks zero-shot and avoids compounding error! A 🧵 (1/9)

Abhishek Gupta (@abhishekunique7) 's Twitter Profile Photo

In my experience, robot 'generalists' are often jacks of all trades but masters of none. In training across multiple tasks and environments, robot policies fail to generalize robustly and effectively to each particular test setting. What if at test time, we non-parametrically

Marius Memmel (@memmelma) 's Twitter Profile Photo

Have some offline data lying around? Use it to robustify few-shot imitation learning! 🤖 STRAP 🎒 is a retrieval-based method that leverages semantic sub-trajectories in offline datasets to augment the training data. 🧵 1/6

Abhishek Gupta (@abhishekunique7) 's Twitter Profile Photo

So we did a bunch of projects with real world reinforcement learning - but it was often too inefficient to be practical to train tabula rasa. This suggests we need better priors, but acquiring these from on-robot data can often be expensive as well. In our recent work, we show

Chuning Zhu (@chuning_zhu) 's Twitter Profile Photo

Scaling imitation learning has been bottlenecked by the need for high-quality robot data, which are expensive to collect. But are we utilizing existing data to the fullest extent? A thread (1/11)