Yang Zheng (@yang_zheng18) 's Twitter Profile
Yang Zheng

@yang_zheng18

PhD student @Stanford

ID: 1219950777944432641

linkhttps://y-zheng18.github.io/ calendar_today22-01-2020 11:51:51

31 Tweet

167 Takipçi

156 Takip Edilen

Jiaman Li (@jiaman01) 's Twitter Profile Photo

🤖 Introducing Human-Object Interaction from Human-Level Instructions! First complete system that generates physically plausible, long-horizon human-object interactions with finger motions in contextual environments, driven by human-level instructions. 🔍 Our approach: - LLMs

Xiaomeng Xu (@xiaomengxu11) 's Twitter Profile Photo

Can robots leverage their entire body to sense and interact with their environment, rather than just relying on a centralized camera and end-effector? Introducing RoboPanoptes, a robot system that achieves whole-body dexterity through whole-body vision. robopanoptes.github.io

Ian Huang (@ianhuang3d) 's Twitter Profile Photo

🏡Building realistic 3D scenes just got smarter! Introducing our #CVPR2025 work, 🔥FirePlace, a framework that enables Multimodal LLMs to automatically generate realistic and geometrically valid placements for objects into complex 3D scenes. How does it work?🧵👇

Hong-Xing "Koven" Yu (@koven_yu) 's Twitter Profile Photo

🔥Want to capture 3D dancing fluids♨️🌫️🌪️💦? No specialized equipment, just one video! Introducing FluidNexus: Now you only need one camera to reconstruct 3D fluid dynamics and predict future evolution! 🧵1/4 Web: yuegao.me/FluidNexus/ Arxiv: arxiv.org/pdf/2503.04720

Qingqing Zhao (@qingqing_zhao_) 's Twitter Profile Photo

Introduce CoT-VLA – Visual Chain-of-Thought reasoning for Robot Foundation Models! 🤖 By leveraging next-frame prediction as visual chain-of-thought reasoning, CoT-VLA uses future prediction to guide action generation and unlock large-scale video data for training. #CVPR2025

Hansheng Chen (@hanshengch) 's Twitter Profile Photo

Excited to share our work: Gaussian Mixture Flow Matching Models (GMFlow) github.com/lakonik/gmflow GMFlow generalizes diffusion models by predicting Gaussian mixture denoising distributions, enabling precise few-step sampling and high-quality generation.

Excited to share our work: 
Gaussian Mixture Flow Matching Models (GMFlow)
github.com/lakonik/gmflow
GMFlow generalizes diffusion models by predicting Gaussian mixture denoising distributions, enabling precise few-step sampling and high-quality generation.
Boyang Deng (@boyang_deng) 's Twitter Profile Photo

Curious about how cities have changed in the past decade? We use MLLMs to analyse 40 million Street View images to answer this. Do you know that "juice shops became a thing in NYC" and "miles of overpasses were painted BLUE in SF"? More at→boyangdeng.com/visual-chronic… (vid ↓ w/ 🔊)

Gordon Wetzstein (@gordonwetzstein) 's Twitter Profile Photo

Video generation of humans with control over body pose and facial expressions is crucial for a plethora of applications. Towards this goal, we introduce a new interspatial attention (ISA) mechanism as a scalable building block for DiT–based video generation models #SIGGRAPH2025

Gordon Wetzstein (@gordonwetzstein) 's Twitter Profile Photo

Most video models 🤯forget the past 🐌slow down over time 🔁rely on bidirectional (not causal) attention Our state-space video world models (SSM) 🧠remember across hundreds of frames ⚡️generate at constant speed ⏩is fully causal, enabling real-time rollout 1/3

Ziyi Wu (@dazitu_616) 's Twitter Profile Photo

📢 Introducing DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models Compared to vanilla DPO, we improve paired data construction and preference label granularity, leading to better visual quality and motion strength with only 1/3 of the data. 🧵

Adam W. Harley (@adamwharley) 's Twitter Profile Photo

AllTracker: Efficient Dense Point Tracking at High Resolution If you're using any point tracker in any project, this is likely a drop-in upgrade—improving speed, accuracy, and density, all at once.

Guangxuan Xiao (@guangxuan_xiao) 's Twitter Profile Photo

I've written the full story of Attention Sinks — a technical deep-dive into how the mechanism was developed and how our research ended up being used in OpenAI's new OSS models. For those interested in the details: hanlab.mit.edu/blog/streaming…

I've written the full story of Attention Sinks — a technical deep-dive into how the mechanism was developed and how our research ended up being used in OpenAI's new OSS models.

For those interested in the details:
hanlab.mit.edu/blog/streaming…