Songwei Ge (@songwei_ge) 's Twitter Profile
Songwei Ge

@songwei_ge

Ph.D. student at @UMDCS

ID: 1309340614836846595

linkhttp://songweige.github.io calendar_today25-09-2020 03:55:08

112 Tweet

354 Followers

196 Following

maxwell jones (@maxwell54650346) 's Twitter Profile Photo

Learning style from a single image is difficult, but what if you had access to an **image pair** instead? I’m excited to share our #SIGGRAPHASIA2024 work PairCustomization, on customizing text-to-image models with a single image pair!! project page: paircustomization.github.io

Songwei Ge (@songwei_ge) 's Twitter Profile Photo

David and Aleks are going to present our work on making sense of SDS this Friday at #NeurIPS2024! Please check it out if you are around!

Jia-Bin Huang (@jbhuang0604) 's Twitter Profile Photo

As my kids are singing APT non-stop these days, I did a bit of reverse engineering of the APT music video and tried to understand why the MV is so addictive. Here is what I learned.

As my kids are singing APT non-stop these days, I did a bit of reverse engineering of the APT music video and tried to understand why the MV is so addictive.

Here is what I learned.
Lea Müller (@leamue27) 's Twitter Profile Photo

- Humans and Structure from Motion - We jointly reconstruct 3D humans, scene point cloud, and cameras from images captured with sparse uncalibrated cameras. ✨Enjoy reading & happy holidays✨ Project page: muelea.github.io/hsfm

- Humans and Structure from Motion -

We jointly reconstruct 3D humans, scene point cloud, and cameras from images captured with sparse uncalibrated cameras.

✨Enjoy reading & happy holidays✨

Project page: muelea.github.io/hsfm
Ming-Yu Liu (@liu_mingyu) 's Twitter Profile Photo

github.com/NVIDIA/Cosmos Cosmos is a developer-first platform designed to help physical AI builders accelerate their development. It has pre-trained world foundation models (diffusion & autoregressive) in different sizes and video tokenizers. They are open models with permissive

Jiaxin Ge (@aomaru_21490) 's Twitter Profile Photo

Introducing "AutoPresent: Designing Structured Visuals From Scratch". We employ code generation to create structured, high-quality presentation slides from scratch! 📄 arxiv.org/abs/2501.00912 🤗 huggingface.co/spaces/JiaxinG… 🔗 github.com/para-lost/Auto… Berkeley AI Research Language Technologies Institute | @CarnegieMellon

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Decentralized Diffusion Models UC Berkeley and Luma AI introduce Decentralized Diffusion Models, a way to train diffusion models on decentralized compute with no communication between nodes. Leveraging the associative property of the marginal flow (due to it being a linear

Decentralized Diffusion Models

UC Berkeley and Luma AI introduce Decentralized Diffusion Models, a way to train diffusion models on decentralized compute with no communication between nodes.

Leveraging the associative property of the marginal flow (due to it being a linear
David McAllister (@davidrmcall) 's Twitter Profile Photo

Decentralized Diffusion Models power stronger models trained on more accessible infrastructure. DDMs mitigate the networking bottleneck that locks training into expensive and power-hungry centralized clusters. They scale gracefully to billions of parameters and generate

Hang Gao (@hangg70) 's Twitter Profile Photo

Very excited to share Stable Virtual Camera, a generalist diffusion model for view synthesis: stable-virtual-camera.github.io It scales well with data, and works out-the-box for different NVS tasks. Code and 🤗 demo are released! 🧵(1/N)

Junyi Zhang (@junyi42) 's Twitter Profile Photo

Introducing St4RTrack!🖖 Simultaneous 4D Reconstruction and Tracking in the world coordinate feed-forwardly, just by changing the meaning of two pointmaps! st4rtrack.github.io

Chung Min Kim (@chungminkim) 's Twitter Profile Photo

Excited to introduce PyRoki ("Python Robot Kinematics"): easier IK, trajectory optimization, motion retargeting... with an open-source toolkit on both CPU and GPU

Songwei Ge (@songwei_ge) 's Twitter Profile Photo

Are embeddings from decoder-only transformers, like Mistral and Qwen, worse than T5 embedding for text-to-image models? Not really! We find that when the embeddings from proper layers are extracted and normalized, they can outperform T5 embedding.

Are embeddings from decoder-only transformers, like Mistral and Qwen, worse than T5 embedding for text-to-image models? Not really!

We find that when the embeddings from proper layers are extracted and normalized, they can outperform T5 embedding.
Yutong Bai (@yutongbai1002) 's Twitter Profile Photo

What would a World Model look like if we start from a real embodied agent acting in the real world? It has to have: 1) A real, physically grounded and complex action space—not just abstract control signals. 2) Diverse, real-life scenarios and activities. Or in short: It has to

Konpat Ta Preechakul (@phizaz) 's Twitter Profile Photo

Some problems can’t be rushed—they can only be done step by step, no matter how many people or processors you throw at them. We’ve scaled AI by making everything bigger and more parallel: Our models are parallel. Our scaling is parallel. Our GPUs are parallel. But what if the

Songwei Ge (@songwei_ge) 's Twitter Profile Photo

Training diffusion-style policies with RL can be as easy as training Gaussian policies. We introduce Flow Policy Optimization (FPO) — bringing flow matching into the policy gradient world. Multimodal. Sampling-agnostic. More expressive than Gaussians.