Chris Rockwell (@_crockwell) Twitter Tweets • TwiCopy

Ang Cao

2 years ago

Lightplane gives 1000x memory saving for differentiable rendering and feature splatting (i.e. unprojecting 2D features to 3D), which is generalizable to a variety of 3D structures. We hope it could solve memory bottleneck in current 3D pipeline and contribute to 3D research.

thumb_up_off_alt43

chat_bubble_outline1

repeat7

shareShare

Tiange Luo

@tiangeluo

2 years ago

We've curated a 1-million 3D-Captioning dataset for Objaverse(-XL), correcting 200k potential misalignments in the original Cap3D captions. Our method employs a pre-trained text-to-3D model to rank rendered views and utilizes GPT-4 Vision. Each caption is linked to a point

thumb_up_off_alt94

chat_bubble_outline3

repeat18

shareShare

Chris Rockwell

@_crockwell

2 years ago

Excited to present our #CVPR2024 *Highlight* FAR on Friday at 10:30 a.m, Arch 4A-E Poster #31. Please feel free to stop by! FAR significantly improves correspondence-based methods using end-to-end pose prediction, making it applicable to many SOTA approaches!

thumb_up_off_alt12

chat_bubble_outline0

repeat2

shareShare

Sarah Jabbour

@sarahjabbour_

a year ago

📢Presenting 𝐃𝐄𝐏𝐈𝐂𝐓: Diffusion-Enabled Permutation Importance for Image Classification Tasks #ECCV2024 We use permutation importance to compute dataset-level explanations for image classifiers using diffusion models (without access to model parameters or training data!)

thumb_up_off_alt30

chat_bubble_outline1

repeat11

shareShare

Aayan Yadav

@ionydv

a year ago

📢 Introducing our #ECCV2024 work, COCO-ReM (COCO Refined Masks), for more reliable benchmarking of object detectors, crucial for the future of object detection research. Paper: arxiv.org/abs/2403.18819 Code: arxiv.org/abs/2403.18819 Website: cocorem.xyz

thumb_up_off_alt20

chat_bubble_outline1

repeat6

shareShare

Ayush Shrivastava

@ayshrv

a year ago

We present Global Matching Random Walks, a simple self-supervised approach to the Tracking Any Point (TAP) problem, accepted to #ECCV2024. We train a global matching transformer to find cycle consistent tracks through video via contrastive random walks (CRW).

thumb_up_off_alt85

chat_bubble_outline1

repeat23

shareShare

Daniel Geng

@dangengdg

a year ago

What happens when you train a video generation model to be conditioned on motion? Turns out you can perform "motion prompting," just like you might prompt an LLM! Doing so enables many different capabilities. Here’s a few examples – check out this thread 🧵 for more results!

thumb_up_off_alt673

chat_bubble_outline20

repeat147

shareShare

Linyi Jin

@jin_linyi

a year ago

Introducing 👀Stereo4D👀 A method for mining 4D from internet stereo videos. It enables large-scale, high-quality, dynamic, *metric* 3D reconstructions, with camera poses and long-term 3D motion trajectories. We used Stereo4D to make a dataset of over 100k real-world 4D scenes.

thumb_up_off_alt524

chat_bubble_outline13

repeat102

shareShare

Yen-Chen Lin

@yen_chen_lin

a year ago

Video generation models exploded onto the scene in 2024, sparked by the release of Sora from OpenAI. I wrote a blog post on key techniques that are used in building large video generation models: yenchenlin.me/blog/2025/01/0…

thumb_up_off_alt509

chat_bubble_outline3

repeat109

shareShare

Chen-Hsuan Lin

@chenhsuanlin

8 months ago

Cameras are key to modeling our dynamic 3D visual world. Can we unlock the 𝘥𝘺𝘯𝘢𝘮𝘪𝘤 3𝘋 𝘐𝘯𝘵𝘦𝘳𝘯𝘦𝘵?! 🌎 📸 𝗗𝘆𝗻𝗣𝗼𝘀𝗲-𝟭𝟬𝟬𝗞 is our answer! Chris Rockwell has curated Internet-scale videos with camera pose annotations for you 🤩 Download: huggingface.co/datasets/nvidi…

thumb_up_off_alt76

chat_bubble_outline1

repeat10

shareShare

Daniel Geng

@dangengdg

7 months ago

Hello! If you like pretty images and videos and want a rec for CVPR oral session, you should def go to Image/Video Gen, Friday at 9am: I'll be presenting "Motion Prompting" Ryan Burgert will be presenting "Go with the Flow" and Pascal CHANG will be presenting "LookingGlass"

thumb_up_off_alt64

chat_bubble_outline3

repeat16

shareShare

Jeongsoo Park

@jespark0

7 months ago

Can AI image detectors keep up with new fakes? Mostly, no. Existing detectors are trained using a handful of models. But there are thousands in the wild! Our work, Community Forensics, uses 4800+ generators to train detectors that generalize to new fakes. #CVPR2025 🧵 (1/5)

thumb_up_off_alt23

chat_bubble_outline1

repeat9

shareShare

Yiming Dou

@_yimingdou

7 months ago

Ever wondered how a scene sounds👂 when you interact👋 with it? Introducing our #CVPR2025 work "Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes" -- we make 3D scene reconstructions audibly interactive! yimingdou.com/hearing_hands/

thumb_up_off_alt92

chat_bubble_outline2

repeat33

shareShare

Ayush Shrivastava

@ayshrv

7 months ago

Excited to share our CVPR 2025 paper on cross-modal space-time correspondence! We present a method to match pixels across different modalities (RGB-Depth, RGB-Thermal, Photo-Sketch, and cross-style images) — trained entirely using unpaired data and self-supervision. Our

thumb_up_off_alt120

chat_bubble_outline1

repeat28

shareShare

Linyi Jin

@jin_linyi

7 months ago

Hello! If you are interested in dynamic 3D or 4D, don't miss the oral session 3A at 9 am on Saturday: Zhengqi Li will be presenting "MegaSaM" I'll be presenting "Stereo4D" and Qianqian Wang will be presenting "CUT3R"

thumb_up_off_alt36

chat_bubble_outline1

repeat6

shareShare