@jesse_y_zhang : Given only successful trajectories, how do we learn to reward unsuccessful rollouts and generalize across tasks? We train with video rewinding, instruction augmentation, and OXE data! For rewinding, we randomly reverse videos to learn to predict decreasing rewards. (3/N) • TwiCopy

Jesse Zhang

@jesse_y_zhang

+ Follow

PhD Student at USC focused on Robotics/Deep RL. Advisors: Erdem Biyik, Jesse Thomason, Joseph J. Lim. Focused on scalable, sample-efficient robot adaptation.

ID: 1341621435136032770

linkhttps://jessezhang.net calendar_today23-12-2020 05:47:43

91 Tweet

729 Followers

374 Following

Jesse Zhang

@jesse_y_zhang

6 months ago

Given only successful trajectories, how do we learn to reward unsuccessful rollouts and generalize across tasks? We train with video rewinding, instruction augmentation, and OXE data! For rewinding, we randomly reverse videos to learn to predict decreasing rewards. (3/N)

thumb_up_off_alt12

chat_bubble_outline2

repeat3

shareShare