Walid BOUSSELHAM (@bousselhamwalid) 's Twitter Profile
Walid BOUSSELHAM

@bousselhamwalid

PhD Student at Bonn University |
Computer Vision, Multi-modal learning and Zero-shot adaptation.
Prev. @NUSingapore & @ENSTAParis
Visiting MiT for the summer24

ID: 1302975089802178560

linkhttp://walidbousselham.com/ calendar_today07-09-2020 14:21:00

106 Tweet

104 Followers

212 Following

AK (@_akhaliq) 's Twitter Profile Photo

Google DeepMind announces Vision-Language Models as a Source of Rewards paper page: huggingface.co/papers/2312.09… Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting

Google DeepMind announces Vision-Language Models as a Source of Rewards

paper page: huggingface.co/papers/2312.09…

Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting
Hilde Kuehne (@hildekuehne) 's Twitter Profile Photo

Monika Wysoczańska Super cool work! We found something similar to the idea of pooling for GEM … github.com/WalBouss/GEM … might be interesting to think about how to merge… overall it looks like an exciting topic! Thanks for sharing 🤩!

Lucas Beyer (bl16) (@giffmana) 's Twitter Profile Photo

super-TL;DR: revival of foveations in vision. They also create a new benchmark for VLMs which is another thing we really need. This looks pretty cool, and after some thinking in 2023, it's also a direction I'm eager to explore in 2024. Can't wait to fully dig into this paper!

fly51fly (@fly51fly) 's Twitter Profile Photo

[CV] EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models arxiv.org/abs/2401.11739 This paper presents a method for image segmentation using diffusion models. By extracting semantic knowledge from a pre-trained diffusion model, fine-grained segmentation maps

[CV] EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models  
arxiv.org/abs/2401.11739  

This paper presents a method for image segmentation using diffusion models. By extracting semantic knowledge from a pre-trained diffusion model, fine-grained segmentation maps
Omar Sanseviero (@osanseviero) 's Twitter Profile Photo

Nice blog post: makeMoE This is a very cool blog post that implements a MoE block + router, discusses initialization and provides very intuitive explanations of how they work huggingface.co/blog/AviSoori1…

AK (@_akhaliq) 's Twitter Profile Photo

pix2gestalt: Amodal Segmentation by Synthesizing Wholes paper page: huggingface.co/papers/2401.14… synthesizes whole objects from only partially visible ones, enabling amodal segmentation, recognition, and 3D reconstruction of occluded objects

Alexander Visheratin (@visheratin) 's Twitter Profile Photo

VLMs have a resolution problem, which prevents them from finding small details in large images. In this Hugging Face community post, I discuss the ways to solve it and describe the details of MC-LLaVA architecture: huggingface.co/blog/visherati…

Hilde Kuehne (@hildekuehne) 's Twitter Profile Photo

Did you know that a pretrained VL foundation model is all you need for open vocabulary localization and segmentation ? ❗️No training needed❗️Just get your favorite model and localize everything! 💎 Pro-tip: try meta-clip All you need to know is below 👇 Grounding x.com/vittoferraricv…

Vittorio Ferrari (@vittoferraricv) 's Twitter Profile Photo

Paper accepted to #CVPR2024! Grounding Everything: Emerging Localization Properties in Vision-Language Transformers Paper: arxiv.org/abs/2312.00878 Demo:huggingface.co/spaces/WalidBo… Code: github.com/WalBouss/GEM With Walid BOUSSELHAM, Felix Petersen, Hilde Kuehne

Paper accepted to #CVPR2024!

Grounding Everything: Emerging Localization Properties in Vision-Language Transformers

Paper: arxiv.org/abs/2312.00878
Demo:huggingface.co/spaces/WalidBo…
Code: github.com/WalBouss/GEM

With <a href="/BousselhamWalid/">Walid BOUSSELHAM</a>, <a href="/FHKPetersen/">Felix Petersen</a>, <a href="/HildeKuehne/">Hilde Kuehne</a>
Hadi Pouransari (@hpouransari) 's Twitter Profile Photo

Are you interested in SOTA compact CLIP models? 🚀🚀 Check out our open-sourced repo for a family of MobileCLIP models, including a ViT-B@224 with 77.2% IN-top1 accuracy. More highlights in 🧵 Paper (appearing in CVPR 2024): arxiv.org/abs/2311.17049 Repo: github.com/apple/ml-mobil…

XuDong Wang (@xdwang101) 's Twitter Profile Photo

🚀 Excited to share InstanceDiffusion @CVPR2024! It adds precise instance-level control for image gen: free-form text conditions per instance and diverse location specs—points, scribbles, boxes & instance masks Code: shorturl.at/dtxSW arXiv: shorturl.at/rQS14 1/n

Cohere For AI (@cohereforai) 's Twitter Profile Photo

Join our community-led Geo-Regional Asia Group on Monday, May 27th as they welcome Walid BOUSSELHAM , PhD student at Bonn University, for a presentation on "An Explainability Method for Vision Transformers via Feature Formation Sensitivity." Learn more: cohere.com/events/c4ai-Wa…

Join our community-led Geo-Regional Asia Group on Monday, May 27th as they welcome <a href="/BousselhamWalid/">Walid BOUSSELHAM</a> , PhD student at Bonn University, for a presentation on "An Explainability Method for Vision Transformers via Feature Formation Sensitivity." 

Learn more: cohere.com/events/c4ai-Wa…
Cohere For AI (@cohereforai) 's Twitter Profile Photo

Be sure to tune in Monday for our community-led event with speaker Walid BOUSSELHAM , who will be presenting "An Explainability Method for Vision Transformers via Feature Formation Sensitivity." 💻

Vittorio Ferrari (@vittoferraricv) 's Twitter Profile Photo

Come to poster 354 at #CVPR2024's to see our work! 10:30am today, Arch 4A-E "Grounding Everything: Emerging Localization Properties in Vision-Language Transformers" Paper: arxiv.org/abs/2312.00878 Demo:huggingface.co/spaces/WalidBo… Code: github.com/WalBouss/GEM

Come to poster 354 at #CVPR2024's to see our work! 10:30am today, Arch 4A-E

"Grounding Everything: Emerging Localization Properties in Vision-Language Transformers"

Paper: arxiv.org/abs/2312.00878
Demo:huggingface.co/spaces/WalidBo…
Code: github.com/WalBouss/GEM
Hilde Kuehne (@hildekuehne) 's Twitter Profile Photo

🚨 New Paper Alert! 🚨 *Mask Inversion* What it does? -> It learns a token representation for a specific region in an image (e.g. a token for a mask, a BBox etc.) project page: walidbousselham.com/MaskInversion/ paper: arxiv.org/abs/2407.20034 code: github.com/WalBouss/MaskI…

🚨 New Paper Alert! 🚨 *Mask Inversion*

What it does? -&gt; It learns a token representation for a specific region in an image (e.g. a token for a mask, a BBox etc.)

project page: walidbousselham.com/MaskInversion/
paper: arxiv.org/abs/2407.20034
code: github.com/WalBouss/MaskI…
Walid BOUSSELHAM (@bousselhamwalid) 's Twitter Profile Photo

🚀 Excited to share our latest research, "MaskInversion: Localized Embeddings via Optimization of Explainability Maps"! Discover how we enhance vision-language models like CLIP for precise image region representation without fine-tuning the underlying foundation model.

Nina Shvetsova (@ninashv__) 's Twitter Profile Photo

🚀We release the dataset, code, models for our "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale," presented at #ECCV2024!🎥📚 🔗Github: github.com/ninatu/howtoca…🔗arXiv: arxiv.org/abs/2310.04900 Anna Kukleva Xudong Hong will go to #NAACL2024 Christian Rupprecht Bernt Schiele Hilde Kuehne

🚀We release the dataset, code, models for our "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale," presented at #ECCV2024!🎥📚

🔗Github: github.com/ninatu/howtoca…🔗arXiv: arxiv.org/abs/2310.04900

<a href="/anna_kukleva_/">Anna Kukleva</a> <a href="/xudong_hong/">Xudong Hong will go to #NAACL2024</a> <a href="/chrirupp/">Christian Rupprecht</a> Bernt Schiele <a href="/HildeKuehne/">Hilde Kuehne</a>