Songwei Ge (@songwei_ge) Twitter Tweets • TwiCopy

maxwell jones

a year ago

Learning style from a single image is difficult, but what if you had access to an **image pair** instead? I’m excited to share our #SIGGRAPHASIA2024 work PairCustomization, on customizing text-to-image models with a single image pair!! project page: paircustomization.github.io

thumb_up_off_alt34

chat_bubble_outline2

repeat11

shareShare

Songwei Ge

@songwei_ge

a year ago

David and Aleks are going to present our work on making sense of SDS this Friday at #NeurIPS2024! Please check it out if you are around!

thumb_up_off_alt43

chat_bubble_outline0

repeat2

shareShare

Songwei Ge

@songwei_ge

a year ago

Check out our new project on making 3D illusions! Here is my favorite example, "Finding Nemo"

thumb_up_off_alt44

chat_bubble_outline1

repeat3

shareShare

Jia-Bin Huang

@jbhuang0604

a year ago

As my kids are singing APT non-stop these days, I did a bit of reverse engineering of the APT music video and tried to understand why the MV is so addictive. Here is what I learned.

thumb_up_off_alt880

chat_bubble_outline25

repeat85

shareShare

Lea Müller

@leamue27

a year ago

- Humans and Structure from Motion - We jointly reconstruct 3D humans, scene point cloud, and cameras from images captured with sparse uncalibrated cameras. ✨Enjoy reading & happy holidays✨ Project page: muelea.github.io/hsfm

thumb_up_off_alt496

chat_bubble_outline6

repeat65

shareShare

Ming-Yu Liu

@liu_mingyu

10 months ago

github.com/NVIDIA/Cosmos Cosmos is a developer-first platform designed to help physical AI builders accelerate their development. It has pre-trained world foundation models (diffusion & autoregressive) in different sizes and video tokenizers. They are open models with permissive

thumb_up_off_alt680

chat_bubble_outline11

repeat161

shareShare

Jiaxin Ge

@aomaru_21490

10 months ago

Introducing "AutoPresent: Designing Structured Visuals From Scratch". We employ code generation to create structured, high-quality presentation slides from scratch! 📄 arxiv.org/abs/2501.00912 🤗 huggingface.co/spaces/JiaxinG… 🔗 github.com/para-lost/Auto… Berkeley AI Research Language Technologies Institute | @CarnegieMellon

thumb_up_off_alt174

chat_bubble_outline4

repeat70

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

10 months ago

Decentralized Diffusion Models UC Berkeley and Luma AI introduce Decentralized Diffusion Models, a way to train diffusion models on decentralized compute with no communication between nodes. Leveraging the associative property of the marginal flow (due to it being a linear

thumb_up_off_alt707

chat_bubble_outline13

repeat112

shareShare

David McAllister

@davidrmcall

10 months ago

Decentralized Diffusion Models power stronger models trained on more accessible infrastructure. DDMs mitigate the networking bottleneck that locks training into expensive and power-hungry centralized clusters. They scale gracefully to billions of parameters and generate

thumb_up_off_alt241

chat_bubble_outline6

repeat44

shareShare

Hang Gao

@hangg70

8 months ago

Very excited to share Stable Virtual Camera, a generalist diffusion model for view synthesis: stable-virtual-camera.github.io It scales well with data, and works out-the-box for different NVS tasks. Code and 🤗 demo are released! 🧵(1/N)

thumb_up_off_alt545

chat_bubble_outline9

repeat65

shareShare

Junyi Zhang

@junyi42

7 months ago

Introducing St4RTrack!🖖 Simultaneous 4D Reconstruction and Tracking in the world coordinate feed-forwardly, just by changing the meaning of two pointmaps! st4rtrack.github.io

thumb_up_off_alt267

chat_bubble_outline6

repeat51

shareShare

Songwei Ge

@songwei_ge

6 months ago

Check out this super cool project that enables humanoid learning from casual human video!! Very exciting moment!

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Chung Min Kim

@chungminkim

6 months ago

Excited to introduce PyRoki ("Python Robot Kinematics"): easier IK, trajectory optimization, motion retargeting... with an open-source toolkit on both CPU and GPU

thumb_up_off_alt1,1K

chat_bubble_outline22

repeat166

shareShare

Songwei Ge

@songwei_ge

5 months ago

Are embeddings from decoder-only transformers, like Mistral and Qwen, worse than T5 embedding for text-to-image models? Not really! We find that when the embeddings from proper layers are extracted and normalized, they can outperform T5 embedding.

thumb_up_off_alt16

chat_bubble_outline0

repeat0

shareShare

Yutong Bai

@yutongbai1002

5 months ago

What would a World Model look like if we start from a real embodied agent acting in the real world? It has to have: 1) A real, physically grounded and complex action space—not just abstract control signals. 2) Diverse, real-life scenarios and activities. Or in short: It has to

thumb_up_off_alt283

chat_bubble_outline17

repeat74

shareShare

Konpat Ta Preechakul

@phizaz

4 months ago

Some problems can’t be rushed—they can only be done step by step, no matter how many people or processors you throw at them. We’ve scaled AI by making everything bigger and more parallel: Our models are parallel. Our scaling is parallel. Our GPUs are parallel. But what if the

thumb_up_off_alt359

chat_bubble_outline17

repeat63

shareShare

Songwei Ge

@songwei_ge

4 months ago

Training diffusion-style policies with RL can be as easy as training Gaussian policies. We introduce Flow Policy Optimization (FPO) — bringing flow matching into the policy gradient world. Multimodal. Sampling-agnostic. More expressive than Gaussians.

thumb_up_off_alt67

chat_bubble_outline3

repeat5

shareShare