Yasser Benigmim (@yasserbenigmim) 's Twitter Profile
Yasser Benigmim

@yasserbenigmim

PhD student at Télécom Paris, @DeepLearning, @ComputerVision

ID: 2369825181

linkhttps://yasserben.github.io/ calendar_today27-02-2014 21:31:21

46 Tweet

122 Followers

1,1K Following

Robin Courant (@robin_courant) 's Twitter Profile Photo

Happy to present E.T. the Exceptional Trajectories: Text-to-Camera-Trajectory Generation with Character Awareness. ECCV2024 with Nicolas DUFOUR, Xi WANG, Marc Christie and Vicky Kalogeiton Paper: arxiv.org/pdf/2407.01516 Webpage: lix.polytechnique.fr/vista/projects…

rid (@ridouaneg_) 's Twitter Profile Photo

(1/8) 🎬 Introducing the Short Film Dataset (SFD), a long video QA benchmark with 1k short films and 5k questions. Why another videoQA dataset? 📖 Story-level QAs 🎥 Publicly available videos 🔒 Minimal data leakage ⏳ Long temporal context questions shortfilmdataset.github.io

(1/8) 🎬 Introducing the Short Film Dataset (SFD), a long video QA benchmark with 1k short films and 5k questions.

Why another videoQA dataset?
📖 Story-level QAs
🎥 Publicly available videos
🔒 Minimal data leakage
⏳ Long temporal context questions

shortfilmdataset.github.io
Subhankar Roy (@sroy907) 's Twitter Profile Photo

Less is more! Continual Learning with task-specific ViTs is computationally expensive. To afford task-specific ViTs we propose to summarize the patch tokens. The reduction in token length through patch summarization reduces MSA operations w/o hurting performance. More info👇

Hugo (@mldhug) 's Twitter Profile Photo

You want to give audio abilities to your VLM without compromising its vision performance? You want to align your audio encoder with a pretrained image encoder without suffering from the modality gap? Check our #NeurIPS2024 paper with Michel Olvera Stéphane LATHUILIÈRE and Slim Essid

You want to give audio abilities to your VLM without compromising its vision performance?  You want to align your audio encoder with a pretrained image encoder without suffering from the modality gap? 
Check our #NeurIPS2024 paper with <a href="/michelolzam/">Michel Olvera</a> <a href="/Steph_lat/">Stéphane LATHUILIÈRE</a>  and Slim Essid
Xi WANG (@xiwang92) 's Twitter Profile Photo

🎥 AKiRa provides control over camera motion and optics (focal length, distortion, aperture) in video diffusion, enabling cinematic effects like fisheye, focus shifts, and dolly zoom. 📄 Paper: arxiv.org/abs/2412.14158 👉 Project Page: lix.polytechnique.fr/vista/projects… 🧵👇

Gianni Franchi (@giannifranchi10) 's Twitter Profile Photo

🚨 New survey published! 🔍 Explainability & Vision Foundation Models dives into the intersection of #XAI and #FoundationModels in vision. We present: ✅ A novel taxonomy ✅ Key challenges ✅ Foundation Models 📖 Read it here 👉 shorturl.at/8S4eD #AI #ComputerVision

Imad (@imadmarouf3) 's Twitter Profile Photo

I’m building CurriboxAI — an AI SaaS to help recruiters & ESNs protect their consultants from client bypass. Here’s the tech stack powering it so far — built for speed & automation. Always open to feedback & curious what you would’ve done differently 👇 #buildinpublic #saas

Junyu Xie (@junyuxiearthur) 's Twitter Profile Photo

Movies are more than just video clips, they are stories! 🎬 We’re hosting the 1st SLoMO Workshop at #ICCV2025 to discuss Story-Level Movie Understanding & Audio Descriptions! Website: slomo-workshop.github.io Competition: huggingface.co/spaces/SLoMO-W…

Movies are more than just video clips, they are stories! 🎬

We’re hosting the 1st SLoMO Workshop at #ICCV2025 to discuss Story-Level Movie Understanding &amp; Audio Descriptions!

Website: slomo-workshop.github.io
Competition: huggingface.co/spaces/SLoMO-W…