Yang Zheng (@yang_zheng18) Twitter Tweets • TwiCopy

Jiaman Li

a year ago

🤖 Introducing Human-Object Interaction from Human-Level Instructions! First complete system that generates physically plausible, long-horizon human-object interactions with finger motions in contextual environments, driven by human-level instructions. 🔍 Our approach: - LLMs

thumb_up_off_alt517

chat_bubble_outline18

repeat112

shareShare

Xiaomeng Xu

@xiaomengxu11

10 months ago

Can robots leverage their entire body to sense and interact with their environment, rather than just relying on a centralized camera and end-effector? Introducing RoboPanoptes, a robot system that achieves whole-body dexterity through whole-body vision. robopanoptes.github.io

thumb_up_off_alt308

chat_bubble_outline11

repeat57

shareShare

Ian Huang

@ianhuang3d

8 months ago

🏡Building realistic 3D scenes just got smarter! Introducing our #CVPR2025 work, 🔥FirePlace, a framework that enables Multimodal LLMs to automatically generate realistic and geometrically valid placements for objects into complex 3D scenes. How does it work?🧵👇

thumb_up_off_alt384

chat_bubble_outline23

repeat105

shareShare

Hong-Xing "Koven" Yu

@koven_yu

7 months ago

🔥Want to capture 3D dancing fluids♨️🌫️🌪️💦? No specialized equipment, just one video! Introducing FluidNexus: Now you only need one camera to reconstruct 3D fluid dynamics and predict future evolution! 🧵1/4 Web: yuegao.me/FluidNexus/ Arxiv: arxiv.org/pdf/2503.04720

thumb_up_off_alt114

chat_bubble_outline5

repeat96

shareShare

Qingqing Zhao

@qingqing_zhao_

7 months ago

Introduce CoT-VLA – Visual Chain-of-Thought reasoning for Robot Foundation Models! 🤖 By leveraging next-frame prediction as visual chain-of-thought reasoning, CoT-VLA uses future prediction to guide action generation and unlock large-scale video data for training. #CVPR2025

thumb_up_off_alt291

chat_bubble_outline5

repeat55

shareShare

Hansheng Chen

@hanshengch

7 months ago

Excited to share our work: Gaussian Mixture Flow Matching Models (GMFlow) github.com/lakonik/gmflow GMFlow generalizes diffusion models by predicting Gaussian mixture denoising distributions, enabling precise few-step sampling and high-quality generation.

thumb_up_off_alt122

chat_bubble_outline1

repeat31

shareShare

Boyang Deng

@boyang_deng

7 months ago

Curious about how cities have changed in the past decade? We use MLLMs to analyse 40 million Street View images to answer this. Do you know that "juice shops became a thing in NYC" and "miles of overpasses were painted BLUE in SF"? More at→boyangdeng.com/visual-chronic… (vid ↓ w/ 🔊)

thumb_up_off_alt88

chat_bubble_outline1

repeat15

shareShare

Gordon Wetzstein

@gordonwetzstein

6 months ago

Video generation of humans with control over body pose and facial expressions is crucial for a plethora of applications. Towards this goal, we introduce a new interspatial attention (ISA) mechanism as a scalable building block for DiT–based video generation models #SIGGRAPH2025

thumb_up_off_alt203

chat_bubble_outline5

repeat20

shareShare

Yang Zheng

@yang_zheng18

6 months ago

More cool videos🔥 and details available on our website: dsaurus.github.io/isa4d/

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

Gordon Wetzstein

@gordonwetzstein

5 months ago

Most video models 🤯forget the past 🐌slow down over time 🔁rely on bidirectional (not causal) attention Our state-space video world models (SSM) 🧠remember across hundreds of frames ⚡️generate at constant speed ⏩is fully causal, enabling real-time rollout 1/3

thumb_up_off_alt183

chat_bubble_outline3

repeat14

shareShare

Ziyi Wu

@dazitu_616

5 months ago

📢 Introducing DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models Compared to vanilla DPO, we improve paired data construction and preference label granularity, leading to better visual quality and motion strength with only 1/3 of the data. 🧵

thumb_up_off_alt169

chat_bubble_outline2

repeat36

shareShare

Adam W. Harley

@adamwharley

5 months ago

AllTracker: Efficient Dense Point Tracking at High Resolution If you're using any point tracker in any project, this is likely a drop-in upgrade—improving speed, accuracy, and density, all at once.

thumb_up_off_alt240

chat_bubble_outline2

repeat39

shareShare

Guangxuan Xiao

@guangxuan_xiao

3 months ago

I've written the full story of Attention Sinks — a technical deep-dive into how the mechanism was developed and how our research ended up being used in OpenAI's new OSS models. For those interested in the details: hanlab.mit.edu/blog/streaming…

thumb_up_off_alt895

chat_bubble_outline17

repeat114

shareShare