Xin Kong (@xinkong_ic) 's Twitter Profile
Xin Kong

@xinkong_ic

PhD candidate in Dyson Robot Vision Lab at Imperial College London

ID: 1457716390526341130

linkhttp://kxhit.github.io calendar_today08-11-2021 14:28:01

51 Tweet

1,1K Followers

1,1K Following

AK (@_akhaliq) 's Twitter Profile Photo

Wonder3D: Single Image to 3D using Cross-Domain Diffusion paper page: huggingface.co/papers/2310.15… introduce Wonder3D, a novel method for efficiently generating high-fidelity textured meshes from single-view images.Recent methods based on Score Distillation Sampling (SDS) have shown

Xin Kong (@xinkong_ic) 's Twitter Profile Photo

It's always important to know the limitations of large models, especially in applying them to robotics/AD. I believe the lack of 3D spatial reasoning will be tackled by the scalable 3D training with posed embedded 2D images.

Gwangbin Bae (@baegwangbin) 's Twitter Profile Photo

Excited to introduce 𝗗𝗦𝗜𝗡𝗘! (#CVPR2024) baegwangbin.github.io/DSINE/ We push the limits of single-image surface normal estimation by rethinking the inductive biases needed for the task. See you in Seattle!

Xin Kong (@xinkong_ic) 's Twitter Profile Photo

Except for the multiview MAE-style transformer's pre-training, the pointmap part is trained on 8.5Mx2=17M imgs with 3D GT points (8 datasets, ScanNet++, CO3D-v2, MegaDepth, ARK-it Scenes, etc.), which is 2x bigger than Objaverse Zero123's 9.6M imgs. Is this scale enough for 3DV?

Xin Kong (@xinkong_ic) 's Twitter Profile Photo

Best 3D Object Gen tool I have tried so far. The multi-view input requires a paid subscription tho. Very interested in the tech details and the roadmap. Deemos Tech The scale of 3D mesh data (img2mesh) seems enough for object-level. How about scene-level? Multiview/Seq-img2img?

Andrew Davison (@ajddavison) 's Twitter Profile Photo

This is an opportunity to do a PhD with me at Imperial College, fully funded and starting in October this year. Apply via the link below by 12th June next week. On-sensor vision will be very important to the future of low power vision in robotics + AR/VR. jobs.ac.uk/job/DHT079/res…

Xin Kong (@xinkong_ic) 's Twitter Profile Photo

👍SD3 -> diffusers ✅Better text generation in images and varying aspect ratios 🔍Rectified Flow Matching +DiT+Separate Image/Text Encoder

Xin Kong (@xinkong_ic) 's Twitter Profile Photo

Text2Video(16 frames), camera condition with plucker embedding + learnable FF, finetune from snap video model (36 × 64 resolution), FIT backbone, 1dx8xA100-40GB training on 65K video clips from Re10K.