Saketh Rambhatla (@rssaketh) 's Twitter Profile
Saketh Rambhatla

@rssaketh

Phd student at University of Maryland, College Park

ID: 200879110

linkhttps://rssaketh.github.io calendar_today10-10-2010 14:38:11

79 Tweet

200 Takipçi

582 Takip Edilen

XuDong Wang (@xdwang101) 's Twitter Profile Photo

🚀 Excited to share InstanceDiffusion @CVPR2024! It adds precise instance-level control for image gen: free-form text conditions per instance and diverse location specs—points, scribbles, boxes & instance masks Code: shorturl.at/dtxSW arXiv: shorturl.at/rQS14 1/n

Pulkit (@pulkitkumar95) 's Twitter Profile Photo

📢 Point tracking 🤝 action recognition at #ECCV2024 We've set the new SoTA of few-shot action recognition by harnessing morion data from point tracking and semantic features from SSL. Curious? Visit Poster #203 Thursday AM to see the future of action recognition🔥. Details:🧵

Roshan Sumbaly (@rsumbaly) 's Twitter Profile Photo

Lights, camera, action - introducing Meta's Movie Gen! Our latest breakthrough in AI-powered media generation, setting a new standard for immersive AI content creation. We're also releasing a 92 page detailed report of what we learned, along with evaluation prompts that we hope

Lights, camera, action - introducing Meta's Movie Gen!

Our latest breakthrough in AI-powered media generation, setting a new standard for immersive AI content creation.

We're also releasing a 92 page detailed report of what we learned, along with evaluation prompts that we hope
Ishan Misra (@imisra_) 's Twitter Profile Photo

So, this is what we were up to for a while :) Building SOTA foundation models for media -- text-to-video, video editing, personalized videos, video-to-audio One of the most exciting projects I got to tech lead at my time in Meta!

Mannat Singh (@mannat_singh) 's Twitter Profile Photo

Check out Movie Gen 🎥 Our latest media generation models for video generation, editing, and personalization, with audio generation! 16 second 1080p videos generated through a simple Llama-style 30B transformer. Demo + detailed 92 page technical report 📝⬇️

Manohar Paluri (@manohar_paluri) 's Twitter Profile Photo

Meta Movie Gen is just freakin cool! Generative Video Foundation models with this quality, precise editing and personalization unlock value for creators, new creative tools and enable Agents that can interact in richer ways closing the loop on learning to unlock world models!

Shelly Sheynin (@shellysheynin) 's Twitter Profile Photo

I’m thrilled and proud to share our model, Movie Gen, that we've been working on for the past year, and in particular, Movie Gen Edit, for precise video editing. 😍 Look how Movie Gen edited my video!

Yuval Kirstain (@ykirstain) 's Twitter Profile Photo

So proud to be part of the Movie Gen project, pushing GenAI boundaries! Two key insights: 1. Amazing team + high-quality data + clean, scalable code + general architecture + GPUs go brr = SOTA video generation. 2. Video editing *without* supervised data: train a *single* model

Kevin Chih-Yao Ma (@chihyaoma) 's Twitter Profile Photo

Hi friends, say hello to Movie Gen. Over the past couple of months, we've been working hard behind the scenes to bring you the latest advancements in video generation. Movie Gen not only packs with text-to-video capability, but also comes with video personalization, editing, and

Roshan Sumbaly (@rsumbaly) 's Twitter Profile Photo

And not just the paper, early next week we'll be releasing our full evaluation sets - the field of media generation would really benefit from having canonical benchmarks. Stay tuned!

Samaneh Azadi (@smnh_azadi) 's Twitter Profile Photo

And here is the most exciting model we have been working on with special capabilies in text-to-video generation, video personalization, editing, and audio generation! Plus, an invaluable tech report released! Welcome to the world, Movie Gen!

Ishan Misra (@imisra_) 's Twitter Profile Photo

We released 92 pages worth of detail including how to benchmark these models! Super critical for the scientific progress in this field :) We'll also release evaluation benchmarks next week to help the research community 💪

Andrew Brown (@andrew__brown__) 's Twitter Profile Photo

🚨 Internship in Meta GenAI NYC 🚨 I have an open PhD internship position for 2025! Interested in exploring visual generative models (or any other exciting ideas) inside the team that brought you Movie Gen and Emu Video? 📩 Send me DM with CV, website, and GScholar profile

Mara Levy (@mlevy1221) 's Twitter Profile Photo

How can we make Imitation Leaning generalize? In my latest work we show that a key point based representation can generalize to novel instances of an object and is agnostic to background changes.

Mannat Singh (@mannat_singh) 's Twitter Profile Photo

Flow matching can transform one distribution to another. So why do text-to-image models map noise to images instead of directly mapping text to images? Wouldn't it be cool to directly connect modalities together? CrossFlow accomplishes exactly that! cross-flow.github.io

Flow matching can transform one distribution to another. So why do text-to-image models map noise to images instead of directly mapping text to images? 
Wouldn't it be cool to directly connect modalities together? CrossFlow accomplishes exactly that! cross-flow.github.io
Shijie Wang (@shijiewang20) 's Twitter Profile Photo

How can we better animate images solely following text descriptions? We present Motion Focal Loss (MotiF) (arxiv.org/abs/2412.16153) to better align motions with text descriptions in text-image-to-video (TI2V) task and release TI2V-Bench, a comprehensive TI2V benchmark. (1/n)

Ching-Yao Chuang (@chingyaochuang) 's Twitter Profile Photo

Super cool to see transformers scaling so effectively for image/video autoencoders! Our model also offers a flexible way to implement variable token length

Rohit Girdhar (@_rohitgirdhar_) 's Twitter Profile Photo

Super excited to share some recent work that shows that pure, text-only LLMs, can see and hear without any training! Our approach, called "MILS", uses LLMs with off-the-shelf multimodal models, to caption images/videos/audio, improve image generation, style transfer, and more!

Super excited to share some recent work that shows that pure, text-only LLMs, can see and hear without any training! Our approach, called "MILS", uses LLMs with off-the-shelf multimodal models, to caption images/videos/audio, improve image generation, style transfer, and more!
Ishan Misra (@imisra_) 's Twitter Profile Photo

Inference time objectives are amazing :) We show that LLMs can be upgraded to multimodal beings by a simple trick :) No training needed! Works on image generation, editing, style transfer and more!