Jacob Yeung (@jacobyeung) 's Twitter Profile
Jacob Yeung

@jacobyeung

ID: 1450257288

calendar_today23-05-2013 01:32:22

0 Tweet

2 Followers

17 Following

Gabriel Sarch (@gabrielsarch) 's Twitter Profile Photo

How can we get VLMs to move their eyes—and reason step-by-step in visually grounded ways? 👀 We introduce ViGoRL, a RL method that anchors reasoning to image regions. 🎯 It outperforms vanilla GRPO and SFT across grounding, spatial tasks, and visual search (86.4% on V*). 👇🧵

Elliott / Shangzhe Wu (@elliottszwu) 's Twitter Profile Photo

This was a really fun and exciting workshop #CVPR2025! Huge thanks to all the speakers, organizers and reviewers #CVPR2026! We hope to be able to release the video recordings soon!

This was a really fun and exciting workshop #CVPR2025! Huge thanks to all the speakers, organizers and reviewers <a href="/CVPR/">#CVPR2026</a>!

We hope to be able to release the video recordings soon!
Jennifer Hsia (@jen_hsia) 's Twitter Profile Photo

1/6 Retrieval is supposed to improve generation in RAG systems. But in practice, adding more documents can hurt performance, even when relevant ones are retrieved. We introduce RAGGED, a framework to measure and diagnose when retrieval helps and when it hurts.

1/6 Retrieval is supposed to improve generation in RAG systems.

But in practice, adding more documents can hurt performance, even when relevant ones are retrieved.

We introduce RAGGED, a framework to measure and diagnose when retrieval helps and when it hurts.
Tarasha Khurana (@tarashakhurana) 's Twitter Profile Photo

Excited to share recent work with Kaihua Chen and Deva Ramanan where we learn to do novel view synthesis for dynamic scenes in a self-supervised manner, only from 2D videos! webpage: cog-nvs.github.io arxiv: arxiv.org/abs/2507.12646 code (soon): github.com/Kaihua-Chen/co…

Nikhil Keetha (@nik__v__) 's Twitter Profile Photo

Meet MapAnything – a transformer that directly regresses factored metric 3D scene geometry (from images, calibration, poses, or depth) in an end-to-end way. No pipelines, no extra stages. Just 3D geometry & cameras, straight from any type of input, delivering new state-of-the-art