Saketh Rambhatla (@rssaketh) Twitter Tweets • TwiCopy

XuDong Wang

2 years ago

🚀 Excited to share InstanceDiffusion @CVPR2024! It adds precise instance-level control for image gen: free-form text conditions per instance and diverse location specs—points, scribbles, boxes & instance masks Code: shorturl.at/dtxSW arXiv: shorturl.at/rQS14 1/n

thumb_up_off_alt145

chat_bubble_outline3

repeat22

shareShare

Pulkit

@pulkitkumar95

a year ago

📢 Point tracking 🤝 action recognition at #ECCV2024 We've set the new SoTA of few-shot action recognition by harnessing morion data from point tracking and semantic features from SSL. Curious? Visit Poster #203 Thursday AM to see the future of action recognition🔥. Details:🧵

thumb_up_off_alt31

chat_bubble_outline1

repeat6

shareShare

Pulkit

@pulkitkumar95

a year ago

Website: cs.umd.edu/~pulkit/tats/ Work done in collabration with Namitha Padmanabhan, Luke Luo, Saketh Rambhatla and Abhinav Shrivastava. 3/3

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Roshan Sumbaly

@rsumbaly

a year ago

Lights, camera, action - introducing Meta's Movie Gen! Our latest breakthrough in AI-powered media generation, setting a new standard for immersive AI content creation. We're also releasing a 92 page detailed report of what we learned, along with evaluation prompts that we hope

thumb_up_off_alt85

chat_bubble_outline2

repeat10

shareShare

Ishan Misra

@imisra_

a year ago

So, this is what we were up to for a while :) Building SOTA foundation models for media -- text-to-video, video editing, personalized videos, video-to-audio One of the most exciting projects I got to tech lead at my time in Meta!

thumb_up_off_alt893

chat_bubble_outline38

repeat70

shareShare

Mannat Singh

@mannat_singh

a year ago

Check out Movie Gen 🎥 Our latest media generation models for video generation, editing, and personalization, with audio generation! 16 second 1080p videos generated through a simple Llama-style 30B transformer. Demo + detailed 92 page technical report 📝⬇️

thumb_up_off_alt16

chat_bubble_outline1

repeat1

shareShare

Manohar Paluri

@manohar_paluri

a year ago

Meta Movie Gen is just freakin cool! Generative Video Foundation models with this quality, precise editing and personalization unlock value for creators, new creative tools and enable Agents that can interact in richer ways closing the loop on learning to unlock world models!

thumb_up_off_alt51

chat_bubble_outline3

repeat4

shareShare

Shelly Sheynin

@shellysheynin

a year ago

I’m thrilled and proud to share our model, Movie Gen, that we've been working on for the past year, and in particular, Movie Gen Edit, for precise video editing. 😍 Look how Movie Gen edited my video!

thumb_up_off_alt831

chat_bubble_outline58

repeat90

shareShare

Yuval Kirstain

@ykirstain

a year ago

So proud to be part of the Movie Gen project, pushing GenAI boundaries! Two key insights: 1. Amazing team + high-quality data + clean, scalable code + general architecture + GPUs go brr = SOTA video generation. 2. Video editing *without* supervised data: train a *single* model

thumb_up_off_alt151

chat_bubble_outline6

repeat24

shareShare

Kevin Chih-Yao Ma

@chihyaoma

a year ago

Hi friends, say hello to Movie Gen. Over the past couple of months, we've been working hard behind the scenes to bring you the latest advancements in video generation. Movie Gen not only packs with text-to-video capability, but also comes with video personalization, editing, and

thumb_up_off_alt44

chat_bubble_outline2

repeat4

shareShare

Roshan Sumbaly

@rsumbaly

a year ago

And not just the paper, early next week we'll be releasing our full evaluation sets - the field of media generation would really benefit from having canonical benchmarks. Stay tuned!

thumb_up_off_alt63

chat_bubble_outline1

repeat7

shareShare

Samaneh Azadi

@smnh_azadi

a year ago

And here is the most exciting model we have been working on with special capabilies in text-to-video generation, video personalization, editing, and audio generation! Plus, an invaluable tech report released! Welcome to the world, Movie Gen!

thumb_up_off_alt31

chat_bubble_outline0

repeat4

shareShare

Ishan Misra

@imisra_

a year ago

We released 92 pages worth of detail including how to benchmark these models! Super critical for the scientific progress in this field :) We'll also release evaluation benchmarks next week to help the research community 💪

thumb_up_off_alt433

chat_bubble_outline10

repeat35

shareShare

Andrew Brown

@andrew__brown__

a year ago

🚨 Internship in Meta GenAI NYC 🚨 I have an open PhD internship position for 2025! Interested in exploring visual generative models (or any other exciting ideas) inside the team that brought you Movie Gen and Emu Video? 📩 Send me DM with CV, website, and GScholar profile

thumb_up_off_alt267

chat_bubble_outline3

repeat27

shareShare

Mara Levy

@mlevy1221

a year ago

How can we make Imitation Leaning generalize? In my latest work we show that a key point based representation can generalize to novel instances of an object and is agnostic to background changes.

thumb_up_off_alt45

chat_bubble_outline1

repeat11

shareShare

Mannat Singh

@mannat_singh

a year ago

Flow matching can transform one distribution to another. So why do text-to-image models map noise to images instead of directly mapping text to images? Wouldn't it be cool to directly connect modalities together? CrossFlow accomplishes exactly that! cross-flow.github.io

thumb_up_off_alt325

chat_bubble_outline2

repeat43

shareShare

Shijie Wang

@shijiewang20

a year ago

How can we better animate images solely following text descriptions? We present Motion Focal Loss (MotiF) (arxiv.org/abs/2412.16153) to better align motions with text descriptions in text-image-to-video (TI2V) task and release TI2V-Bench, a comprehensive TI2V benchmark. (1/n)

thumb_up_off_alt54

chat_bubble_outline6

repeat11

shareShare

Ching-Yao Chuang

@chingyaochuang

10 months ago

Super cool to see transformers scaling so effectively for image/video autoencoders! Our model also offers a flexible way to implement variable token length

thumb_up_off_alt49

chat_bubble_outline0

repeat4

shareShare

Rohit Girdhar

@_rohitgirdhar_

10 months ago

Super excited to share some recent work that shows that pure, text-only LLMs, can see and hear without any training! Our approach, called "MILS", uses LLMs with off-the-shelf multimodal models, to caption images/videos/audio, improve image generation, style transfer, and more!

thumb_up_off_alt247

chat_bubble_outline7

repeat38

shareShare

Ishan Misra

@imisra_

9 months ago

Inference time objectives are amazing :) We show that LLMs can be upgraded to multimodal beings by a simple trick :) No training needed! Works on image generation, editing, style transfer and more!

thumb_up_off_alt203

chat_bubble_outline6

repeat43

shareShare