Jeff Li (@jiefengli_jeff) Twitter Tweets • TwiCopy

Wenlong Huang

2 years ago

How to harness foundation models for *generalization in the wild* in robot manipulation? Introducing VoxPoser: use LLM+VLM to label affordances and constraints directly in 3D perceptual space for zero-shot robot manipulation in the real world! 🌐 voxposer.github.io 🧵👇

thumb_up_off_alt569

chat_bubble_outline10

repeat142

shareShare

Michael Black

@michael_j_black

2 years ago

French bread, red wine, cheese, the Eiffel Tower. And now #ICCV2025 is coming to Paris. All that's missing is you! Come make #ICCV2023 the most diverse ever. If you need financial support to attend check out the DEI page and apply by July 20! iccv2023.thecvf.com/diversity.equi…

thumb_up_off_alt64

chat_bubble_outline15

repeat12

shareShare

Leonard Bruns

@leonard_bruns

2 years ago

Tracking any point in a video is a fundamental problem in computer vision. The recent @DeepMind paper TAPIR by Carl Doersch et al. significantly improved over prior state-of-the-art. I visualized the main components of their approach using Rerun.

thumb_up_off_alt173

chat_bubble_outline3

repeat31

shareShare

Yuliang Xiu

@yuliangxiu

2 years ago

Thanks AK for sharing our new work TeCH. Reconstruction is a form of Conditional Generation, especially for one-shot and few-shot occasions. Reconstruct the visible like architect, imagine the invisible like painter. Projct: huangyangyi.github.io/tech/

thumb_up_off_alt97

chat_bubble_outline2

repeat21

shareShare

Jim Fan

@drjimfan

2 years ago

A neural network can smell like humans do for the first time!👃🏽 Digital smell is a modality that AI community has long ignored, but maybe one day useful for robot chef 👩🏽‍🍳? Here's how to do smell2text: 1. Collected 5,000 molecules and ask humans to label "creamy, chocolate,

thumb_up_off_alt3,3K

chat_bubble_outline120

repeat757

shareShare

Hao-Shu Fang

@haoshu_fang

2 years ago

🤖Joint-level control + portability = robot data in the wild! We present AirExo, a low-cost hardware, and showcase how in-the-wild data enhances robot learning, even in contact-rich tasks. A promising tool for large-scale robot learning & TeleOP, now at airexo.github.io!

thumb_up_off_alt197

chat_bubble_outline6

repeat37

shareShare

Jiawei Yang

@jiaweiyang118

2 years ago

Have you ever seen some artifacts inherent in ViT's feature maps? Wonder why and how to address them? Check out our latest work! See more demos at jiawei-yang.github.io/DenoisingViT/ Paper: arxiv.org/abs/2401.02957 Code: github.com/Jiawei-Yang/De…

thumb_up_off_alt52

chat_bubble_outline1

repeat5

shareShare

Yue Wang

@yuewang314

2 years ago

Ever wonder why well-trained Vision Transformers still exhibit noises? We introduce Denoising Vision Transformers (DVT), led by amazing Jiawei Yang Katie Luo Jeff Li, and with long-term collaborators Yonglong Tian Kilian Weinberger. Website: jiawei-yang.github.io/DenoisingViT/ Code:

thumb_up_off_alt221

chat_bubble_outline8

repeat28

shareShare

Jeff Li

@jiefengli_jeff

2 years ago

Wonder why there are artifacts in Vision Transformers and how to address them? Check out our latest work! Website: jiawei-yang.github.io/DenoisingViT/ Code: github.com/Jiawei-Yang/De… Paper: arxiv.org/abs/2401.02957

thumb_up_off_alt42

chat_bubble_outline0

repeat4

shareShare

Davis Rempe

@davrempe

2 years ago

Check out our recent work led by Mathis Petrovich that generates human motions from a timeline of text prompts, similar to a typical video editor. The method operates entirely at test time, so it works with off-the-shelf motion diffusion models! Project: mathis.petrovich.fr/stmc/

thumb_up_off_alt74

chat_bubble_outline0

repeat12

shareShare

Matthias Niessner

@mattniessner

a year ago

Visiting Shenzhen this week and honored to give a keynote at China3DV! Looking forward to an exciting technical program: csig3dv.net If you are around and want catch up or chat about research, just reach out to me :)

thumb_up_off_alt108

chat_bubble_outline3

repeat4

shareShare

Jeff Li

@jiefengli_jeff

a year ago

Congrats @willbokuishen and the team! Can’t wait to try this out.

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Jeff Li

@jiefengli_jeff

a year ago

A day of excitement, join NVIDIA Research and get two papers accepted to ECCV on the same day🚀

thumb_up_off_alt168

chat_bubble_outline12

repeat1

shareShare

Jiawei Yang

@jiaweiyang118

a year ago

Very excited to get this out: “DVT: Denoising Vision Transformers”. We've identified and combated those annoying positional patterns in many ViTs. Our approach denoises them, achieving SOTA results and stunning visualizations! Learn more on our website: jiawei-yang.github.io/DenoisingViT/

thumb_up_off_alt405

chat_bubble_outline8

repeat81

shareShare

Tsung-Yi Lin

@tsungyilincv

a year ago

NVIDIA Graduate Fellowship Program (2025-2026) is now open for applications. Awards are up to $60,000, along with mentor and technical support. The deadline is September 13th, so make sure to apply! research.nvidia.com/graduate-fello…

thumb_up_off_alt75

chat_bubble_outline0

repeat14

shareShare

Pavlo Molchanov

@pavlomolchanov

a year ago

🚀 Our team is hiring! Join to Advance Efficiency in Deep Learning at NVIDIA! 🚀 🔗 Apply here: bit.ly/nvdler-job Our team, Deep Learning Efficiency Research (nv-dler.github.io) at NVIDIA Research, is about a year old, and we are expanding. We're looking for

thumb_up_off_alt198

chat_bubble_outline3

repeat32

shareShare

Dmytro Mishkin 🇺🇦

@ducha_aiki

10 months ago

BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation shengze wang Jiefeng Li, Tianye Li Ye Yuan, Henry Fuchs, Koki Nagano Shalini De Mello Michael Stengel tl;dr: camera intrinsics matter for human mesh estimation,can optimize via rendering arxiv.org/abs/2412.08640

BLADE: Single-view Body Mesh Learning through Accurate Depth Estimation

<a href="/mct1224/">shengze wang</a> Jiefeng Li, <a href="/_TianyeLi/">Tianye Li</a> Ye Yuan, Henry Fuchs, <a href="/luminohope/">Koki Nagano</a> <a href="/shalinidemello/">Shalini De Mello</a> <a href="/virtualitaet/">Michael Stengel</a>

tl;dr: camera intrinsics matter for human mesh estimation,can optimize via rendering
arxiv.org/abs/2412.08640

thumb_up_off_alt37

chat_bubble_outline2

repeat9

shareShare

Songyou Peng

@songyoupeng

9 months ago

Dreaming of very accurate metric depth in stunning 4K resolution at speed? Check out our Prompt Depth Anything! We "prompt" Depth Anything with sparse lidar cues, enabling a wide range of applications! 🔗 Project page with codes and cool visualizations: promptda.github.io

thumb_up_off_alt137

chat_bubble_outline1

repeat14

shareShare

Dimitris Tzionas

@dimtzionas

8 months ago

📢 I am #hiring 2x #PhD candidates to work on Human-centric #3D #ComputerVision at the University of #Amsterdam! 📢 The positions are funded by an #ERC #StartingGrant. For details and for submitting your application please see: werkenbij.uva.nl/en/vacancies/p… 🆘 Deadline: Feb 16 🆘

thumb_up_off_alt87

chat_bubble_outline2

repeat21

shareShare

Jeff Li

@jiefengli_jeff

4 months ago

📣📣📣 Excited to share GENMO: A Generalist Model for Human Motion. Words can’t perfectly describe human motion—so we build GENMO. It’s everything to motion. 🔥Video, Text, Music, Audio, Keyframes, Spatial Control…🔥 -- GENMO handles it all within a single model. 📹 Two

thumb_up_off_alt118

chat_bubble_outline2

repeat32

shareShare