Gaurav Parmar (@gauravtparmar) 's Twitter Profile
Gaurav Parmar

@gauravtparmar

PhD @ CMU

ID: 1274645824572514305

linkhttps://gauravparmar.com/ calendar_today21-06-2020 10:10:29

24 Tweet

286 Takipçi

361 Takip Edilen

Mark Sheinin (@marksheinin) 's Twitter Profile Photo

Ever wondered how humans would perceive the world with eyes 1000x faster, sensitive to infrared, or capable of "seeing" coffee cup vibrations from our voice? For humans, it's a mystery. But for 𝗺𝗮𝗰𝗵𝗶𝗻𝗲𝘀, my group at WIS aims to answer these ques., and you can take part!

Ever wondered how humans would perceive the world with eyes 1000x faster, sensitive to infrared, or capable of "seeing" coffee cup vibrations from our voice?
For humans, it's a mystery. But for 𝗺𝗮𝗰𝗵𝗶𝗻𝗲𝘀, my group at WIS aims to answer these ques., and you can take part!
AK (@_akhaliq) 's Twitter Profile Photo

One-Step Image Translation with Text-to-Image Models In this work, we address two limitations of existing conditional diffusion models: their slow inference speed due to the iterative denoising process and their reliance on paired data for model fine-tuning.

Jun-Yan Zhu (@junyanz89) 's Twitter Profile Photo

[1/2] We’ve released the code for #pix2pixturbo and #CycleGANTurbo. These conditional GANs are able to adapt a text-to-image model such as SD-Turbo for both paired and unpaired image translation with a single step (0.11 sec on A100 and 0.29 sec on A6000). Try our code and the

Radamés Ajna (@radamar) 's Twitter Profile Photo

Testing new pix2pix-Turbo in real-time, very interesting GAN architecture that leverages SD-Turbo model. Here I'm using edge2image LoRA single-step inference 🤯

Or Patashnik (@opatashnik) 's Twitter Profile Photo

Our new inversion method facilitates interactive image editing with few-step diffusion models 🏃‍♀️🏃 I played with it all morning, so much fun -- less than 2 sec per edit 😲 Try the demo! Project page: garibida.github.io/ReNoise-Invers… Cool demo: huggingface.co/spaces/garibid…

dinesh reddy (@dineshredy) 's Twitter Profile Photo

WALT3D has accepted as Oral at #cvpr (top 90 out of 12000) WALT3D:Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects under Occlusion Project Page: cs.cmu.edu/~walt3d Key Idea: Convert you image to 3D under severe Occlusions

Amil Dravid (@_amildravid) 's Twitter Profile Photo

The latent space of earlier generative models like GANS can linearly encode concepts of the data. What if the data was model weights? We present weights2weights, a subspace in diffusion weights that behaves as an interpretable latent space over customized diffusion models.

Ananye Agarwal (@anag004) 's Twitter Profile Photo

As a founding researcher, I have seen Skild AI grow exponentially. We changed 3 offices, grew 10x in human (and robot) numbers, and become a unicorn in less than a year. If you want to scale up robotics and work with a cracked team of engineers and scientists, come to Skild AI.

Shivam Duggal (@shivamduggal4) 's Twitter Profile Photo

Current vision systems use fixed-length representations for all images. In contrast, human intelligence or LLMs (eg: OpenAI o1) adjust compute budgets based on the input. Since different images demand diff. processing & memory, how can we enable vision systems to be adaptive ? 🧵

Homanga Bharadhwaj (@mangahomanga) 's Twitter Profile Photo

HandsOnVLM: An in-context action prediction assistant for daily activities. It enables predicting future interaction trajectories of human hands in a scene given natural language queries. Evaluations across 100s of diverse scenarios in homes, offices, and outdoors! 1/n

Jackson (Kuan-Chieh) Wang (@kcjacksonwang) 's Twitter Profile Photo

One of the motivating application of this project was to emulate a "photo album" experience. With VisualComposer, you can create image variations from one image. But it also became a more general tool where you not only can generate image variations, but also compose any visual

One of the motivating application of this project was to emulate a "photo album" experience.  With VisualComposer, you can create image variations from one image.  But it also became a more general tool where you not only can generate image variations, but also compose any visual
Kfir Aberman (@abermankfir) 's Twitter Profile Photo

Text prompts have shaped how we compose images with foundation models. But what if we could simply inject Visual Prompts instead? We introduce 🌟Visual Composer🌟 which achieves high-fidelity compositions of subjects and backgrounds with visual prompts! snap-research.github.io/visual-compose…

Text prompts have shaped how we compose images with foundation models. But what if we could simply inject Visual Prompts instead? 

We introduce 🌟Visual Composer🌟 which achieves high-fidelity compositions of subjects and backgrounds with visual prompts!

snap-research.github.io/visual-compose…
Muyang Li (@lmxyy1999) 's Twitter Profile Photo

🚀 How to run 12B FLUX.1 on your local laptop with 2-3× speedup? Come check out our #SVDQuant (#ICLR2025 Spotlight) poster session! 🎉 🗓️ When: Friday, Apr 25, 10–12:30 (Singapore time) 📍 Where: Hall 3 + Hall 2B, Poster 169 📌 Poster: tinyurl.com/poster-svdquant 🎮 Demo:

🚀 How to run 12B FLUX.1 on your local laptop with 2-3× speedup? Come check out our #SVDQuant (#ICLR2025 Spotlight) poster session! 🎉 
🗓️ When: Friday, Apr 25, 10–12:30 (Singapore time)
📍 Where: Hall 3 + Hall 2B, Poster 169
📌 Poster: tinyurl.com/poster-svdquant
🎮 Demo:
Xun Huang (@xunhuang1995) 's Twitter Profile Photo

Real-time video generation is finally real — without sacrificing quality. Introducing Self-Forcing, a new paradigm for training autoregressive diffusion models. The key to high quality? Simulate the inference process during training by unrolling transformers with KV caching.

Kfir Aberman (@abermankfir) 's Twitter Profile Photo

🚀 Career Update After years pushing the boundaries of Generative AI at some of the world’s top companies -> I’m going startup. I’ve joined Decart as a founding team member, leading the charge to build our San Francisco office from the ground up. decart.ai