Alexey Bokhovkin
@abokhovkin
Computer Vision researcher @ TUM
3D Indoor Understanding
ID: 1334896500275548161
04-12-2020 16:25:15
42 Tweet
343 Followers
233 Following
Can we match visual features jointly across multiple frames? Yes! Barbara Roessle's #ICCV2023 paper proposes a differentiable pose optimization for end2end feature matching across multiple frames, thus obtaining better poses! barbararoessle.github.io/e2e_multi_viewโฆ youtu.be/uuLb6GfM9Cg
Diffusion models are awesome! Check out our survey on ๐๐ข๐๐๐ฎ๐ฌ๐ข๐จ๐ง ๐๐จ๐๐๐ฅ๐ฌ ๐๐จ๐ซ ๐๐ข๐ฌ๐ฎ๐๐ฅ ๐๐จ๐ฆ๐ฉ๐ฎ๐ญ๐ข๐ง๐ ! We give an introduction to diffusion models and highlight how they are used by state-of-the-art methods in graphics and vision. arxiv.org/abs/2310.07204
Check out Christian Diller's CG-HOI :) We generate realistic 3D human-object interactions, from object geometry and text description. A key ingredient is explicit modeling of contact, during training and as guidance during inference. cg-hoi.christian-diller.de youtube.com/watch?v=GNyQwTโฆ
Check out our #CVPR'24 papers on 3D human interactions, generative 3D modeling, and uncertainty-aware and unsupervised 3D semantic scene understanding! Congrats to Lei Li David Rozenberszki Christian Diller Yawar Siddiqui Shivangi Jiapeng Tang Anh-Quan Cao for their amazing work!
Excited to present DiffCAD coming to #SIGGRAPH2024! Daoyi Gao introduces the first probabilistic single-view CAD retrieval & alignment. We train only on synthetic -> generalize robustly to real images! Check out the code: daoyig.github.io/DiffCAD_/ w/David Rozenberszki, Stefan Leutenegger
๐ข๐ข ๐๐๐ฎ๐ฌ๐ฌ๐ข๐๐ง๐๐ฉ๐๐๐๐ก: Audio-Driven Gaussian Avatars ๐ข๐ข We synthesize photorealistic and 3D-consistent talking human head avatars driven directly from spoken audio. More specifically, we introduce an efficient 3DGS-based representation, combined with an
๐ขDNF: Generating 4D animations with dictionary-based neural fields! Xinyi Zhang presents a new dictionary-based neural field for unconditional 4D generation of deforming shapes -- generating motions with high-quality shape and temporal consistency. xzhang-t.github.io/project/DNF/
๐ข๐ข๐๐๐ : ๐๐๐ฎ๐ฌ๐ฌ๐ข๐๐ง ๐๐ฏ๐๐ญ๐๐ซ ๐๐๐๐จ๐ง๐ฌ๐ญ๐ซ๐ฎ๐๐ญ๐ข๐จ๐ง ๐๐ซ๐จ๐ฆ ๐๐จ๐ง๐จ๐๐ฎ๐ฅ๐๐ซ ๐๐ข๐๐๐จ๐ฌ ๐ฏ๐ข๐ ๐๐ฎ๐ฅ๐ญ๐ข-๐ฏ๐ข๐๐ฐ ๐๐ข๐๐๐ฎ๐ฌ๐ข๐จ๐ง๐ข๐ข We reconstruct animatable Gaussian head avatars from monocular videos captured by commodity devices such as
Excited to announce ScanNet++ v2!๐ Chandan Yeshwanth and Yueh-Cheng Liu have been working tirelessly to bring: ๐น1006 high-fidelity 3D scans ๐น+ DSLR & iPhone captures ๐น+ rich semantics Elevating 3D scene understanding to the next level!๐ w/ Matthias Niessner kaldir.vc.in.tum.de/scannetpp
๐ข ScanNet++ v2 Benchmark Release! ๐ Test your state-of-the-art models on: ๐น Novel View Synthesis ๐ธโก๏ธ๐ผ๏ธ ๐น 3D Semantic & Instance Segmentation ๐ค๐๐ถ๏ธ Shoutout to Chandan Yeshwanth and Yueh-Cheng Liu for their incredible work๐ ๐Check it out: kaldir.vc.in.tum.de/scannetpp/
๐ขAnimating the Uncaptured ๐ข We animate 3D humanoid meshes using video diffusion priors given a text prompt. ๐ฅyoutu.be/_YL1J_V3smI ๐marcb.pro/atu Realistic motion generation for 3D characters - without motion capture! ๐ Great work by Marc Benedรญ Angela Dai
๐ขExCap3D: Multilevel Captioning of Objects in 3D Scenes Chandan Yeshwanth generates consistent object and part-level descriptions of objects in 3D scenes, and introduces a new dataset with 190k captions for 34k ScanNet++ objects. Project: cy94.github.io/excap3d w/ David Rozenberszki
๐ขSceneFactor code is released! SceneFactor is a factored latent diffusion for controllable, large-scale scene synthesis and editing! w/ Quan Meng, Shubham Tulsiani, Angela Dai Check out the code here: github.com/alexeybokhovkiโฆ. We present SceneFactor at #CVPR2025 on Fri 13, -10:30