Xin Cai (@xincai1998) 's Twitter Profile
Xin Cai

@xincai1998

ID: 1453750772865269766

calendar_today28-10-2021 15:49:50

75 Tweet

90 Takipçi

560 Takip Edilen

AK (@_akhaliq) 's Twitter Profile Photo

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision paper page: huggingface.co/papers/2312.16… We have witnessed significant progress in deep learning-based 3D vision, ranging from neural radiance field (NeRF) based 3D representation learning to applications in

Xingyi He (@xingyihe1) 's Twitter Profile Photo

Excited to share our work MatchAnything: We pre-train strong universal image matching models that exhibit remarkable generalizability on unseen multi-modality matching and registration tasks. Project page: zju3dv.github.io/MatchAnything/ Huggingface Demo: huggingface.co/spaces/LittleF…

Saining Xie (@sainingxie) 's Twitter Profile Photo

When I first saw diffusion models, I was blown away by how naturally they scale during inference: you train them with fixed flops, but during test time, you can ramp it up by like 1,000x. This was way before it became a big deal with o1. But honestly, the scaling isn’t that

AK (@_akhaliq) 's Twitter Profile Photo

The MatAnyone app is out on the AI app store Stable Video Matting with Consistent Memory Propagation. MatAnyone is a practical human video matting framework supporting target assignment. Try to drop your video/image, assign the target masks with a few clicks, and get the the

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Is Noise Conditioning Necessary for Denoising Generative Models? "Motivated by research on blind image denoising, we investigate a variety of denoising-based generative models in the absence of noise conditioning. To our surprise, most models exhibit graceful degradation, and in

Is Noise Conditioning Necessary for Denoising Generative Models?

"Motivated by research on blind image denoising, we investigate a variety of denoising-based generative models in the absence of noise conditioning. To our surprise, most models exhibit graceful degradation, and in
Jon Barron (@jon_barron) 's Twitter Profile Photo

I randomly happened upon a recording of a talk I gave at Stanford in late 2019. It's a good snapshot of some of the computational photography work I was doing from 2013-2019 (denoising, white balance, tone mapping, portrait mode). Fun research decade in retrospect!

Dreaming Tulpa 🥓👑 (@dreamingtulpa) 's Twitter Profile Photo

Goodbye LoRA (Part 17) 👋 Diffusion Self-Distillation can generate high-quality images of specific subjects in new settings by preserving identity. Also supports relighting 👌

Goodbye LoRA (Part 17) 👋

Diffusion Self-Distillation can generate high-quality images of specific subjects in new settings by preserving identity. Also supports relighting 👌
Saining Xie (@sainingxie) 's Twitter Profile Photo

Some further thoughts on the idea of "thinking with images": 1) zero-shot tool use is limited -- you can’t just call an object detector to do visual search. That’s why approaches like VisProg/ViperGPT/Visual-sketchpad will not generalize or scale well. 2) visual search needs to

Some further thoughts on the idea of "thinking with images":

1) zero-shot tool use is limited -- you can’t just call an object detector to do visual search. That’s why approaches like VisProg/ViperGPT/Visual-sketchpad will not generalize or scale well.

2) visual search needs to
Jon Barron (@jon_barron) 's Twitter Profile Photo

Here's my 3DV talk, in chapters: 1) Intro / NeRF boilerplate. 2) Recent reconstruction work. 3) Recent generative work. 4) Radiance fields as a field. 5) Why generative video has bitter-lessoned 3D. 6) Why generative video hasn't bitter-lessoned 3D. 5 & 6 are my favorites.

Here's my 3DV talk, in chapters:

1) Intro / NeRF boilerplate.
2) Recent reconstruction work.
3) Recent generative work.
4) Radiance fields as a field.
5) Why generative video has bitter-lessoned 3D.
6) Why generative video hasn't bitter-lessoned 3D.

5 & 6 are my favorites.
Alec Helbling (@alec_helbling) 's Twitter Profile Photo

Flow matching produces smooth, deterministic trajectories. In contrast, the sampling process of a diffusion model is chaotic, resembling the random motion of gas particles.

Cheng Lu (@clu_cheng) 's Twitter Profile Photo

I might have missed something, but isn’t this approach just the same as continuous-time consistency models? sCM: parameterize f(x_t,t) by DDIM solver. Meanflow: parameterize f(x_t, t) = x_t + (t - r) u(x_t, t) which is Euler solver. The training algorithm is just the same as

Kosta Derpanis (@csprofkgd) 's Twitter Profile Photo

For the generative modeling folks, check out this lineup 👀 Happening TODAY! #CVPR2025 Workshop on Visual Generative Modeling: What’s After Diffusion?

For the generative modeling folks, check out this lineup 👀  Happening TODAY!

#CVPR2025 Workshop on Visual Generative Modeling: What’s After Diffusion?
Seohong Park (@seohong_park) 's Twitter Profile Photo

Q-learning is not yet scalable seohong.me/blog/q-learnin… I wrote a blog post about my thoughts on scalable RL algorithms. To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).

Q-learning is not yet scalable

seohong.me/blog/q-learnin…

I wrote a blog post about my thoughts on scalable RL algorithms.

To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).
Phillip Isola (@phillip_isola) 's Twitter Profile Photo

Our computer vision textbook is now available for free online here: visionbook.mit.edu We are working on adding some interactive components like search and (beta) integration with LLMs. Hope this is useful and feel free to submit Github issues to help us improve the text!

Ilya Chugunov @ilyac on bsky (@_ilya_c) 's Twitter Profile Photo

My favourite photo from my last vacation was #CapturedWithIndigo, the computational photography app that Adobe Nextcam just released after years of hard work! (and that I helped debug at least a little) App: apps.apple.com/us/app/project… Blog with more info: research.adobe.com/articles/indig…

My favourite photo from my last vacation was #CapturedWithIndigo,  the computational photography app that Adobe Nextcam just released after years of hard work! (and that I helped debug at least a little)

App: apps.apple.com/us/app/project…

Blog with more info: research.adobe.com/articles/indig…
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Transition Matching: Scalable and Flexible Generative Modeling "This paper introduces Transition Matching (TM), a novel discrete-time, continuous-state generative paradigm that unifies and advances both diffusion/flow models and continuous AR generation. TM decomposes complex

Transition Matching: Scalable and Flexible Generative Modeling

"This paper introduces Transition Matching (TM), a novel discrete-time,  continuous-state generative paradigm that unifies and advances both  diffusion/flow models and continuous AR generation. TM decomposes  complex
Yam Peleg (@yampeleg) 's Twitter Profile Photo

Wild paper They prove (!!) a transformer block (Attn + MLP) running on prompt Outputs the same logits with no prompt If MLP weights updated by vector: W′ = W + ΔW Calc from attn latent: ΔW = (W·Δa) × (A(x)ᵀ / ‖A(x)‖²) Given prompt: Δa = A(C, x) − A(x) Fucking fine tuning.

Wild paper

They prove (!!) a transformer block (Attn + MLP) running on prompt

Outputs the same logits with no prompt

If MLP weights updated by vector:
W′ = W + ΔW

Calc from attn latent:
ΔW = (W·Δa) × (A(x)ᵀ / ‖A(x)‖²)

Given prompt:
Δa = A(C, x) − A(x)

Fucking fine tuning.
James Chen (@jchencxh) 's Twitter Profile Photo

I (finally) wrote a blog post. I talk about what's needed for creating a learning bias/objective for semantic reuse, using insights from what's effective in SSL and LLMs. The key arguments: Section 1: LLMs supervise the construction of semantics by predicting the next token.