Gaurav Parmar (@gauravtparmar) Twitter Tweets • TwiCopy

Mark Sheinin

2 years ago

Ever wondered how humans would perceive the world with eyes 1000x faster, sensitive to infrared, or capable of "seeing" coffee cup vibrations from our voice? For humans, it's a mystery. But for 𝗺𝗮𝗰𝗵𝗶𝗻𝗲𝘀, my group at WIS aims to answer these ques., and you can take part!

thumb_up_off_alt30

chat_bubble_outline1

repeat11

shareShare

AK

@_akhaliq

2 years ago

One-Step Image Translation with Text-to-Image Models In this work, we address two limitations of existing conditional diffusion models: their slow inference speed due to the iterative denoising process and their reliance on paired data for model fine-tuning.

thumb_up_off_alt517

chat_bubble_outline3

repeat117

shareShare

Jun-Yan Zhu

@junyanz89

2 years ago

[1/2] We’ve released the code for #pix2pixturbo and #CycleGANTurbo. These conditional GANs are able to adapt a text-to-image model such as SD-Turbo for both paired and unpaired image translation with a single step (0.11 sec on A100 and 0.29 sec on A6000). Try our code and the

thumb_up_off_alt229

chat_bubble_outline4

repeat41

shareShare

Radamés Ajna

@radamar

2 years ago

Testing new pix2pix-Turbo in real-time, very interesting GAN architecture that leverages SD-Turbo model. Here I'm using edge2image LoRA single-step inference 🤯

thumb_up_off_alt203

chat_bubble_outline11

repeat45

shareShare

Or Patashnik

@opatashnik

2 years ago

Our new inversion method facilitates interactive image editing with few-step diffusion models 🏃‍♀️🏃 I played with it all morning, so much fun -- less than 2 sec per edit 😲 Try the demo! Project page: garibida.github.io/ReNoise-Invers… Cool demo: huggingface.co/spaces/garibid…

thumb_up_off_alt48

chat_bubble_outline2

repeat5

shareShare

dinesh reddy

@dineshredy

2 years ago

WALT3D has accepted as Oral at #cvpr (top 90 out of 12000) WALT3D:Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects under Occlusion Project Page: cs.cmu.edu/~walt3d Key Idea: Convert you image to 3D under severe Occlusions

thumb_up_off_alt22

chat_bubble_outline3

repeat7

shareShare

Amil Dravid

@_amildravid

a year ago

The latent space of earlier generative models like GANS can linearly encode concepts of the data. What if the data was model weights? We present weights2weights, a subspace in diffusion weights that behaves as an interpretable latent space over customized diffusion models.

thumb_up_off_alt186

chat_bubble_outline4

repeat39

shareShare

Ananye Agarwal

@anag004

a year ago

As a founding researcher, I have seen Skild AI grow exponentially. We changed 3 offices, grew 10x in human (and robot) numbers, and become a unicorn in less than a year. If you want to scale up robotics and work with a cracked team of engineers and scientists, come to Skild AI.

thumb_up_off_alt197

chat_bubble_outline4

repeat9

shareShare

Shivam Duggal

@shivamduggal4

a year ago

Current vision systems use fixed-length representations for all images. In contrast, human intelligence or LLMs (eg: OpenAI o1) adjust compute budgets based on the input. Since different images demand diff. processing & memory, how can we enable vision systems to be adaptive ? 🧵

thumb_up_off_alt481

chat_bubble_outline10

repeat66

shareShare

Homanga Bharadhwaj

@mangahomanga

a year ago

HandsOnVLM: An in-context action prediction assistant for daily activities. It enables predicting future interaction trajectories of human hands in a scene given natural language queries. Evaluations across 100s of diverse scenarios in homes, offices, and outdoors! 1/n

thumb_up_off_alt180

chat_bubble_outline3

repeat28

shareShare

Jackson (Kuan-Chieh) Wang

@kcjacksonwang

a year ago

One of the motivating application of this project was to emulate a "photo album" experience. With VisualComposer, you can create image variations from one image. But it also became a more general tool where you not only can generate image variations, but also compose any visual

thumb_up_off_alt19

chat_bubble_outline0

repeat4

shareShare

Kfir Aberman

@abermankfir

a year ago

Text prompts have shaped how we compose images with foundation models. But what if we could simply inject Visual Prompts instead? We introduce 🌟Visual Composer🌟 which achieves high-fidelity compositions of subjects and backgrounds with visual prompts! snap-research.github.io/visual-compose…

thumb_up_off_alt43

chat_bubble_outline0

repeat10

shareShare

Gaurav Parmar

@gauravtparmar

10 months ago

This is really cool

thumb_up_off_alt4

chat_bubble_outline2

repeat0

shareShare

Muyang Li

@lmxyy1999

7 months ago

🚀 How to run 12B FLUX.1 on your local laptop with 2-3× speedup? Come check out our #SVDQuant (#ICLR2025 Spotlight) poster session! 🎉 🗓️ When: Friday, Apr 25, 10–12:30 (Singapore time) 📍 Where: Hall 3 + Hall 2B, Poster 169 📌 Poster: tinyurl.com/poster-svdquant 🎮 Demo:

thumb_up_off_alt33

chat_bubble_outline0

repeat7

shareShare

Xun Huang

@xunhuang1995

6 months ago

Real-time video generation is finally real — without sacrificing quality. Introducing Self-Forcing, a new paradigm for training autoregressive diffusion models. The key to high quality? Simulate the inference process during training by unrolling transformers with KV caching.

thumb_up_off_alt765

chat_bubble_outline25

repeat120

shareShare

Kfir Aberman

@abermankfir

5 months ago

🚀 Career Update After years pushing the boundaries of Generative AI at some of the world’s top companies -> I’m going startup. I’ve joined Decart as a founding team member, leading the charge to build our San Francisco office from the ground up. decart.ai

thumb_up_off_alt95

chat_bubble_outline10

repeat4

shareShare