Anh Thai (@ngailapdi) 's Twitter Profile
Anh Thai

@ngailapdi

ID: 4712972292

linkhttps://anhthai1997.wordpress.com/ calendar_today05-01-2016 10:08:06

203 Tweet

612 Followers

1,1K Following

Stefan Stojanov (@sstj389) 's Twitter Profile Photo

Extracting structure that’s implicitly learned by video foundation models _without_ relying on labeled data is a fundamental challenge. What’s a better place to start than extracting motion? Temporal correspondence is a key building block of perception. Check out our paper!

Bolin Lai (@bryanislucky) 's Twitter Profile Photo

📢#CVPR2025 Introducing InstaManip, a novel multimodal autoregressive model for few-shot image editing. 🎯InstaManip can learn a new image editing operation from textual and visual guidance via in-context learning, and apply it to new query images. [1/8] bolinlai.github.io/projects/Insta…

Jia-Bin Huang (@jbhuang0604) 's Twitter Profile Photo

Exploration is key for robots to generalize, especially in open-ended environments with vague goals and sparse rewards. BUT, how do we go beyond random poking? Wouldn't it be great to have a robot that explores an environment just like a kid? Introducing Imagine, Verify,

Shangchen Zhou (@shangchenzhou) 's Twitter Profile Photo

With #ObjectClear, you can now remove any objects, along with their shadows and reflections, from your images in just a few clicks or strokes! 👉Try our demo (click version): huggingface.co/spaces/jixin01… Big thanks to AK Adina Yakup!

Zhenjun Zhao (@zhenjun_zhao) 's Twitter Profile Photo

Ov3R: Open-Vocabulary Semantic 3D Reconstruction from RGB Videos Ziren Gong, Xiaohan Li, Fabio Tosi, Jiawei Han, Stefano Mattoccia, Jianfei Cai, Matteo Poggi tl;dr: CLIP->SLAM3R; CLIP+DINO+CG3D->2D-3D fused descriptor arxiv.org/abs/2507.22052

Ov3R: Open-Vocabulary Semantic 3D Reconstruction from RGB Videos

Ziren Gong, Xiaohan Li, <a href="/fabiotosi92/">Fabio Tosi</a>, Jiawei Han, <a href="/s_matt/">Stefano Mattoccia</a>, Jianfei Cai, <a href="/mattpoggi/">Matteo Poggi</a>

tl;dr: CLIP-&gt;SLAM3R; CLIP+DINO+CG3D-&gt;2D-3D fused descriptor

arxiv.org/abs/2507.22052
Zhenjun Zhao (@zhenjun_zhao) 's Twitter Profile Photo

Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images Xiangyu Sun, Haoyi jiang, Liu Liu, Seungtae Nam, Gyeongjin Kang, Xinjie wang, Wei Sui, Zhizhong Su, Wenyu Liu, Xinggang Wang, Eunbyung Park tl;dr:

Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images

Xiangyu Sun, Haoyi jiang, Liu Liu, Seungtae Nam, Gyeongjin Kang, Xinjie wang, Wei Sui, Zhizhong Su, Wenyu Liu, <a href="/XinggangWang/">Xinggang Wang</a>, <a href="/silverbottlep/">Eunbyung Park</a>

tl;dr:
Zhenjun Zhao (@zhenjun_zhao) 's Twitter Profile Photo

A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation Shuting He, Peilin Ji, Yitong Yang, Changshuo Wang, Jiayi Ji, Yinglin Wang, Henghui Ding tl;dr: in title arxiv.org/abs/2508.09977

A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation

Shuting He, Peilin Ji, Yitong Yang, Changshuo Wang, Jiayi Ji, Yinglin Wang, Henghui Ding

tl;dr: in title

arxiv.org/abs/2508.09977
Rana Hanocka (@ranahanocka) 's Twitter Profile Photo

We’ve been building something we’re 𝑟𝑒𝑎𝑙𝑙𝑦 excited about – LL3M: LLM-powered agents that turn text into editable 3D assets. LL3M models shapes as interpretable Blender code, making geometry, appearance, and style easy to modify. 🔗 threedle.github.io/ll3m 1/

Xingang Pan (@xingangp) 's Twitter Profile Photo

Introducing 𝗦𝗧𝗿𝗲𝗮𝗺𝟯𝗥, a new 3D geometric foundation model for efficient 3D reconstruction from streaming input. Similar to LLMs, STream3R uses casual attention during training and KVCache at inference. No need to worry about post-alignment or reconstructing from scratch.

Zhenjun Zhao (@zhenjun_zhao) 's Twitter Profile Photo

GSFix3D: Diffusion-Guided Repair of Novel Views in Gaussian Splatting Jiaxin Wei, Stefan Leutenegger, Simon Schaefer tl;dr: fuse mesh and 3DGS->rendered images->pretrained diffusion model+random mask augmentation->removes artifacts+inpainting+completion arxiv.org/abs/2508.14717

GSFix3D: Diffusion-Guided Repair of Novel Views in Gaussian Splatting

Jiaxin Wei, <a href="/StefanLeuteneg1/">Stefan Leutenegger</a>, Simon Schaefer

tl;dr: fuse mesh and 3DGS-&gt;rendered images-&gt;pretrained diffusion model+random mask augmentation-&gt;removes artifacts+inpainting+completion

arxiv.org/abs/2508.14717
Kwang Moo Yi (@kwangmoo_yi) 's Twitter Profile Photo

Wei et al., "GSFix3D: Diffusion-Guided Repair of Novel Views in Gaussian Splatting" Fine-tune a diffusion model to fix missing holes and such in 3DGS reconstructions. Similar to other works that do this, but interestingly, it uses meshes alongside 3DGS to remove floaters, etc.

Fei-Fei Li (@drfeifei) 's Twitter Profile Photo

A picture now is worth more than a thousand words in genAI; it can be turned into a full 3D world! And you can stroll in this garden endlessly long, it will still be there.

MrNeRF (@janusch_patas) 's Twitter Profile Photo

Human3R: Everyone Everywhere All at Once Note: I recorded the video from the interactive demo on their project page (linked in the comment below). Abstract (excerpt): Human3R jointly recovers global multi-person SMPL-X bodies ("everyone"), dense 3D scenes ("everywhere"), and

Michael Niemeyer (@mi_niemeyer) 's Twitter Profile Photo

How do we reconstruct a 3D scene from photos with varying exposures? Standard methods often fail, leaving you with blown-out colors or disturbing shadows. We're excited to introduce Neural Exposure Fields (NExF), our new work accepted at #NeurIPS2025! 🧵

Tenny Yin (@tennyyin) 's Twitter Profile Photo

Does VGGT offer an edge over DINO in spatial tasks? New research shows that visual-only features (DINO) outperform visual-geometry features (VGGT) here!

Does VGGT offer an edge over DINO in spatial tasks?
New research shows that visual-only features (DINO) outperform visual-geometry features (VGGT) here!
Xiaoyang Wu (@xiaoyangwu_) 's Twitter Profile Photo

Introducing Concerto 🎶 Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations. What is it: Concerto is a self-supervised Point Transformer V3 that jointly learns from 2D and 3D modalities, producing rich spatial representations. It can take both point clouds and