GANWANSHUI
@woson12
Ph.D. student at the University of Tokyo, ganwanshui.github.io
ID: 1150021182722232320
13-07-2019 12:36:31
198 Tweet
40 Followers
451 Following
Most past work throws human data into a pretraining mix. EgoMimic showed that, with proper alignment, you can co-train with human data. In his internship project at Pi, Simar Kareer took this a step further and showed that human data can "post-train" VLAs. This enables robots
Excited to share Large Video Planner (LVP) -- a open source video-based robot foundation model trained Kempner Institute at Harvard University that can zero-shot generalize across both domains and robots. Through third-party evals, LVP outperforms both SOTA VLAs and video models across novel tasks/robots!
In my recent blog post, I argue that "vision" is only well-defined as part of perception-action loops, and that the conventional view of computer vision - mapping imagery to intermediate representations (3D, flow, segmentation...) is about to go away. vincentsitzmann.com/blog/bitter_le…