Ziyang Chen (@czyangchen) 's Twitter Profile
Ziyang Chen

@czyangchen

Ph.D. Student at @UMich, advised by @andrewhowens
multimodal learning, audio-visual learning
prev research Intern @Adobe and @AIatMeta

ID: 1404892067373928448

linkhttps://ificl.github.io/ calendar_today15-06-2021 20:02:35

80 Tweet

358 Takipçi

413 Takip Edilen

Sarah Jabbour (@sarahjabbour_) 's Twitter Profile Photo

📢Presenting 𝐃𝐄𝐏𝐈𝐂𝐓: Diffusion-Enabled Permutation Importance for Image Classification Tasks #ECCV2024 We use permutation importance to compute dataset-level explanations for image classifiers using diffusion models (without access to model parameters or training data!)

📢Presenting 𝐃𝐄𝐏𝐈𝐂𝐓: Diffusion-Enabled Permutation Importance for Image Classification Tasks #ECCV2024

We use permutation importance to compute dataset-level explanations for image classifiers using diffusion models (without access to model parameters or training data!)
Ayush Shrivastava (@ayshrv) 's Twitter Profile Photo

We present Global Matching Random Walks, a simple self-supervised approach to the Tracking Any Point (TAP) problem, accepted to #ECCV2024. We train a global matching transformer to find cycle consistent tracks through video via contrastive random walks (CRW).

Daniel Geng (@dangengdg) 's Twitter Profile Photo

What happens when you train a video generation model to be conditioned on motion? Turns out you can perform "motion prompting," just like you might prompt an LLM! Doing so enables many different capabilities. Here’s a few examples – check out this thread 🧵 for more results!

hugo flores garcía 🌻 (@hugggof) 's Twitter Profile Photo

new paper! 🗣️Sketch2Sound💥 Sketch2Sound can create sounds from sonic imitations (i.e., a vocal imitation or a reference sound) via interpretable, time-varying control signals. paper: arxiv.org/abs/2412.08550 web: hugofloresgarcia.art/sketch2sound

Linyi Jin (@jin_linyi) 's Twitter Profile Photo

Introducing 👀Stereo4D👀 A method for mining 4D from internet stereo videos. It enables large-scale, high-quality, dynamic, *metric* 3D reconstructions, with camera poses and long-term 3D motion trajectories. We used Stereo4D to make a dataset of over 100k real-world 4D scenes.

Daniel Geng (@dangengdg) 's Twitter Profile Photo

I'll be presenting "Images that Sound" today at #NeurIPS2024! East Exhibit Hall A-C #2710. Come say hi to me and Andrew Owens :) (Ziyang Chen sadly could not make it, but will be there in spirit :') )

Sarah Jabbour (@sarahjabbour_) 's Twitter Profile Photo

I’m on the PhD internship market for Spr/Summer 2025! I have experience in multimodal AI (EHR, X-ray, text), explainability for image models w/ genAI, clinician-AI interaction (surveyed 700+ doctors), and tabular foundation models. Please reach out if you think there’s a fit!

Tiange Luo (@tiangeluo) 's Twitter Profile Photo

Will VLMs adhere strictly to their learned priors, unable to perform visual reasoning on content never existed on the Internet? We propose ViLP, a benchmark designed to probe the visual-language priors of VLMs by constructing Question-Image-Answer triplets that deliberately

Luma AI (@lumalabsai) 's Twitter Profile Photo

Introducing Modify Video. Reimagine any video. Shoot it in post with director-grade control over style, character, and setting. Restyle expressive performances, swap entire worlds, or redesign the frame to your vision. Shoot once. Shape infinitely.

Phillip Isola (@phillip_isola) 's Twitter Profile Photo

In arxiv.org/abs/2510.02425, we find if you ask an LLM to “imagine seeing,” then how it processes text becomes more like how a vision system would represent that same scene. If you ask it to “imagine hearing,” its representation becomes more like that of an auditory model. 3/9

In arxiv.org/abs/2510.02425, we find if you ask an LLM to “imagine seeing,” then how it processes text becomes more like how a vision system would represent that same scene.

If you ask it to “imagine hearing,” its representation becomes more like that of an auditory model.

3/9