Xin Kong (@xinkong_ic) Twitter Tweets • TwiCopy

a year ago

It's always important to know the limitations of large models, especially in applying them to robotics/AD. I believe the lack of 3D spatial reasoning will be tackled by the scalable 3D training with posed embedded 2D images.

thumb_up_off_alt9

chat_bubble_outline1

Chris Offner

@chrisoffner3d

7 months ago

Jerome Revaud I thought you must be joking so I ran it on these two of my own pictures (excuse my dirty dishes 😉) and wow, it works beautifully.

<a href="/JeromeRevaud/">Jerome Revaud</a> I thought you must be joking so I ran it on these two of my own pictures (excuse my dirty dishes 😉) and wow, it works beautifully.

thumb_up_off_alt16

chat_bubble_outline1

repeat4

Gwangbin Bae

@baegwangbin

7 months ago

Excited to introduce 𝗗𝗦𝗜𝗡𝗘! (#CVPR2024) baegwangbin.github.io/DSINE/ We push the limits of single-image surface normal estimation by rethinking the inductive biases needed for the task. See you in Seattle!

Xin Kong

7 months ago

Except for the multiview MAE-style transformer's pre-training, the pointmap part is trained on 8.5Mx2=17M imgs with 3D GT points (8 datasets, ScanNet++, CO3D-v2, MegaDepth, ARK-it Scenes, etc.), which is 2x bigger than Objaverse Zero123's 9.6M imgs. Is this scale enough for 3DV?

thumb_up_off_alt6

Xin Kong

7 months ago

Play SoTA Gaussian Splatting SLAM system with interactive GUI is enjoyable! Congrats on the code release!

thumb_up_off_alt16

repeat1

Xin Kong

4 months ago

Best 3D Object Gen tool I have tried so far. The multi-view input requires a paid subscription tho. Very interested in the tech details and the roadmap. Deemos Tech The scale of 3D mesh data (img2mesh) seems enough for object-level. How about scene-level? Multiview/Seq-img2img?

thumb_up_off_alt2

Andrew Davison

@ajddavison

4 months ago

This is an opportunity to do a PhD with me at Imperial College, fully funded and starting in October this year. Apply via the link below by 12th June next week. On-sensor vision will be very important to the future of low power vision in robotics + AR/VR. jobs.ac.uk/job/DHT079/res…

Xin Kong

3 months ago

👍SD3 -> diffusers ✅Better text generation in images and varying aspect ratios 🔍Rectified Flow Matching +DiT+Separate Image/Text Encoder

thumb_up_off_alt5

Xin Kong

3 months ago

🥵#CVPR2024

thumb_up_off_alt9

Xin Kong

2 months ago

Text2Video(16 frames), camera condition with plucker embedding + learnable FF, finetune from snap video model (36 × 64 resolution), FIT backbone, 1dx8xA100-40GB training on 65K video clips from Re10K.

thumb_up_off_alt16