Yuliang Guo (@33yuliangguo) 's Twitter Profile
Yuliang Guo

@33yuliangguo

3D Vision | GenAI | Robotics @Bosch Research Silicon Valley, Previously @BrownUniversity PhD.

ID: 1709699780979679232

linkhttps://yuliangguo.github.io calendar_today04-10-2023 22:39:45

47 Tweet

57 Followers

150 Following

World Labs (@theworldlabs) 's Twitter Profile Photo

Introducing RTFM (Real-Time Frame Model): a highly efficient World Model that generates video frames in real time as you interact with it, powered by a single H100 GPU. RTFM renders persistent and 3D consistent worlds, both real and imaginary. Try our demo of RTFM today!

Abhinav Kumar (@abhinav1kumar) 's Twitter Profile Photo

Joint work with amazing team of Yuliang Guo (Bosch Research North America), Zhihao Zhang (MSU), Xinyu Huang (Bosch Research North America), Liu Ren (Bosch Research North America) and Xiaoming Liu (MSU) (4/N)

Google Research (@googleresearch) 's Twitter Profile Photo

Today at #NeurIPS2025, we present Titans, a new architecture that combines the speed of RNNs with the performance of Transformers. It uses deep neural memory to learn in real-time, effectively scaling to contexts larger than 2 million tokens. More at: goo.gle/3Kd5ojF

Today at #NeurIPS2025, we present Titans, a new architecture that combines the speed of RNNs with the performance of Transformers. It uses deep neural memory to learn in real-time, effectively scaling to contexts larger than 2 million tokens. More at: goo.gle/3Kd5ojF
Ivan Skorokhodov (@isskoro) 's Twitter Profile Photo

I think that JiT (arxiv.org/abs/2511.13720) might have been my favorite paper of 2025. From the discussions with my friends, it got quite some controversy with many people dismissing it as some trivial reinvention of x-prediction, so I would like to put my perspective on it here

I think that JiT (arxiv.org/abs/2511.13720) might have been my favorite paper of 2025. From the discussions with my friends, it got quite some controversy with many people dismissing it as some trivial reinvention of x-prediction, so I would like to put my perspective on it here
Chelsea Finn (@chelseabfinn) 's Twitter Profile Photo

Studying generalist reward models is hard: robot datasets focus on successful demos, not failures. We introduce: - a large-scale reward modeling benchmark - a data augmentation scheme - a generalist reward model that outperforms frontier VLMs Paper: arxiv.org/abs/2601.00675

Studying generalist reward models is hard: robot datasets focus on successful demos, not failures.

We introduce:
- a large-scale reward modeling benchmark
- a data augmentation scheme
- a generalist reward model that outperforms frontier VLMs

Paper: arxiv.org/abs/2601.00675
Yuliang Guo (@33yuliangguo) 's Twitter Profile Photo

👀 Found an interesting signal hidden in the evolution of #GR00t VLA pre-training data: Facts • Human videos removed after N1.5 • World-model generated data (DreamGen) removed in N1.6 Takeaway? So far, neither matches real, targeted robot data at scale. 👉 Data quality

👀 Found an interesting signal hidden in the evolution of #GR00t VLA pre-training data:

Facts
• Human videos removed after N1.5
• World-model generated data (DreamGen) removed in N1.6

Takeaway?
So far, neither matches real, targeted robot data at scale.
👉 Data quality
Yuliang Guo (@33yuliangguo) 's Twitter Profile Photo

Really amazing to see policy, world model, and value function unified in 1 model, with certain SoTA performance being shown in practice. Can not wait to check whether the three are truly aligned at deployment, and how such alignment affects the final performance in real world

Yuliang Guo (@33yuliangguo) 's Twitter Profile Photo

It’s truly an honor to co-organize such an exciting workshop at the upcoming CVPR. Huge thanks to our co-organizers for making this happen, and sincere appreciation to our all-star speakers for accepting our invitations.

Yuliang Guo (@33yuliangguo) 's Twitter Profile Photo

It’s great to see increasing appreciation for 360° video generators. In fact, they even offer additional benefits under controlled trajectories: complete scenes can be generated with significantly simplified trajectories, reducing the need for long paths and mitigating the

Abhishek Gupta (@abhishekunique7) 's Twitter Profile Photo

Check out new work from Entong Su for RL finetuning pre-training flow policies with residual flow steering. The motivation is simple - steering input diffusion noise can struggle to handle higher dexterity problems like multi-fingered hands, because the base policy may not cover

Vincent Sitzmann (@vincesitzmann) 's Twitter Profile Photo

In my recent blog post, I argue that "vision" is only well-defined as part of perception-action loops, and that the conventional view of computer vision - mapping imagery to intermediate representations (3D, flow, segmentation...) is about to go away. vincentsitzmann.com/blog/bitter_le…

Yuliang Guo (@33yuliangguo) 's Twitter Profile Photo

As a 3D vision researcher, it is indeed so painful to realize : Robot intelligence may not benefit from explicit 3D representations. Besides Vincent Sitzmann's deep insights, in practice, there could be two additional bottlenecks in adopting explicit 3D in perception–action

Zixun Huang (@zixun_h) 's Twitter Profile Photo

🚀 Excited to share our latest ICLR 2026 work 3DGEER (3D Gaussian Exact and Efficient Rendering) — now open sourced! 🔗 Code github.com/boschresearch/… 🔗 gsplat integration github.com/boschresearch/…