Walid BOUSSELHAM (@bousselhamwalid) Twitter Tweets • TwiCopy

Hilde Kuehne

2 years ago

Monika Wysoczańska Super cool work! We found something similar to the idea of pooling for GEM … github.com/WalBouss/GEM … might be interesting to think about how to merge… overall it looks like an exciting topic! Thanks for sharing 🤩!

thumb_up_off_alt8

chat_bubble_outline1

repeat2

shareShare

Lucas Beyer (bl16)

@giffmana

2 years ago

super-TL;DR: revival of foveations in vision. They also create a new benchmark for VLMs which is another thing we really need. This looks pretty cool, and after some thinking in 2023, it's also a direction I'm eager to explore in 2024. Can't wait to fully dig into this paper!

thumb_up_off_alt96

chat_bubble_outline4

repeat9

shareShare

fly51fly

@fly51fly

2 years ago

[CV] EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models arxiv.org/abs/2401.11739 This paper presents a method for image segmentation using diffusion models. By extracting semantic knowledge from a pre-trained diffusion model, fine-grained segmentation maps

thumb_up_off_alt8

chat_bubble_outline0

repeat3

shareShare

Omar Sanseviero

@osanseviero

2 years ago

Nice blog post: makeMoE This is a very cool blog post that implements a MoE block + router, discusses initialization and provides very intuitive explanations of how they work huggingface.co/blog/AviSoori1…

thumb_up_off_alt272

chat_bubble_outline2

repeat46

shareShare

AK

@_akhaliq

2 years ago

pix2gestalt: Amodal Segmentation by Synthesizing Wholes paper page: huggingface.co/papers/2401.14… synthesizes whole objects from only partially visible ones, enabling amodal segmentation, recognition, and 3D reconstruction of occluded objects

thumb_up_off_alt227

chat_bubble_outline0

repeat39

shareShare

Zhenfei Yin @ ICLR 2025

@9ldrohjze56jsh9

2 years ago

🚀🚀🚀I'm thrilled to announce that our work Octavius has been accepted at #ICLR2024 ! We are the first to attempt combining #MoE and #LoRA, applying them to #MLLM. It's exciting to see the huge inspiration it has brought to the community, with emerging works like LLaVA-MoE and

thumb_up_off_alt47

chat_bubble_outline0

repeat11

shareShare

Alexander Visheratin

@visheratin

2 years ago

VLMs have a resolution problem, which prevents them from finding small details in large images. In this Hugging Face community post, I discuss the ways to solve it and describe the details of MC-LLaVA architecture: huggingface.co/blog/visherati…

thumb_up_off_alt148

chat_bubble_outline5

repeat39

shareShare

Hilde Kuehne

@hildekuehne

2 years ago

Did you know that a pretrained VL foundation model is all you need for open vocabulary localization and segmentation ? ❗️No training needed❗️Just get your favorite model and localize everything! 💎 Pro-tip: try meta-clip All you need to know is below 👇 Grounding x.com/vittoferraricv…

thumb_up_off_alt27

chat_bubble_outline0

repeat1

shareShare

Vittorio Ferrari

@vittoferraricv

2 years ago

Paper accepted to #CVPR2024! Grounding Everything: Emerging Localization Properties in Vision-Language Transformers Paper: arxiv.org/abs/2312.00878 Demo:huggingface.co/spaces/WalidBo… Code: github.com/WalBouss/GEM With Walid BOUSSELHAM, Felix Petersen, Hilde Kuehne

thumb_up_off_alt126

chat_bubble_outline1

repeat19

shareShare

Hadi Pouransari

@hpouransari

2 years ago

Are you interested in SOTA compact CLIP models? 🚀🚀 Check out our open-sourced repo for a family of MobileCLIP models, including a ViT-B@224 with 77.2% IN-top1 accuracy. More highlights in 🧵 Paper (appearing in CVPR 2024): arxiv.org/abs/2311.17049 Repo: github.com/apple/ml-mobil…

thumb_up_off_alt185

chat_bubble_outline1

repeat52

shareShare

XuDong Wang

@xdwang101

2 years ago

🚀 Excited to share InstanceDiffusion @CVPR2024! It adds precise instance-level control for image gen: free-form text conditions per instance and diverse location specs—points, scribbles, boxes & instance masks Code: shorturl.at/dtxSW arXiv: shorturl.at/rQS14 1/n

thumb_up_off_alt145

chat_bubble_outline3

repeat22

shareShare

Angie Boggust

@angie_boggust

a year ago

I had so much fun working on LeGrad with Walid BOUSSELHAM, Hendrik S (find me in blueksy), and Hilde Kuehne! Demo it here 💻 huggingface.co/spaces/WalidBo…

thumb_up_off_alt15

chat_bubble_outline0

repeat6

shareShare

Cohere Labs

@cohere_labs

a year ago

Join our community-led Geo-Regional Asia Group on Monday, May 27th as they welcome Walid BOUSSELHAM , PhD student at Bonn University, for a presentation on "An Explainability Method for Vision Transformers via Feature Formation Sensitivity." Learn more: cohere.com/events/c4ai-Wa…

Join our community-led Geo-Regional Asia Group on Monday, May 27th as they welcome <a href="/BousselhamWalid/">Walid BOUSSELHAM</a> , PhD student at Bonn University, for a presentation on "An Explainability Method for Vision Transformers via Feature Formation Sensitivity."

Learn more: cohere.com/events/c4ai-Wa…

thumb_up_off_alt10

chat_bubble_outline1

repeat2

shareShare

Cohere Labs

@cohere_labs

a year ago

Be sure to tune in Monday for our community-led event with speaker Walid BOUSSELHAM , who will be presenting "An Explainability Method for Vision Transformers via Feature Formation Sensitivity." 💻

thumb_up_off_alt8

chat_bubble_outline0

repeat3

shareShare

Vittorio Ferrari

@vittoferraricv

a year ago

Come to poster 354 at #CVPR2024's to see our work! 10:30am today, Arch 4A-E "Grounding Everything: Emerging Localization Properties in Vision-Language Transformers" Paper: arxiv.org/abs/2312.00878 Demo:huggingface.co/spaces/WalidBo… Code: github.com/WalBouss/GEM

thumb_up_off_alt33

chat_bubble_outline0

repeat9

shareShare

Hilde Kuehne

@hildekuehne

a year ago

🚨 New Paper Alert! 🚨 *Mask Inversion* What it does? -> It learns a token representation for a specific region in an image (e.g. a token for a mask, a BBox etc.) project page: walidbousselham.com/MaskInversion/ paper: arxiv.org/abs/2407.20034 code: github.com/WalBouss/MaskI…

thumb_up_off_alt171

chat_bubble_outline3

repeat37

shareShare

Walid BOUSSELHAM

@bousselhamwalid

a year ago

🚀 Excited to share our latest research, "MaskInversion: Localized Embeddings via Optimization of Explainability Maps"! Discover how we enhance vision-language models like CLIP for precise image region representation without fine-tuning the underlying foundation model.

thumb_up_off_alt17

chat_bubble_outline0

repeat3

shareShare

Nina Shvetsova

@ninashv__

a year ago

🚀We release the dataset, code, models for our "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale," presented at #ECCV2024!🎥📚 🔗Github: github.com/ninatu/howtoca…🔗arXiv: arxiv.org/abs/2310.04900 Anna Kukleva Xudong Hong will go to #NAACL2024 @chrirupp Bernt Schiele Hilde Kuehne

thumb_up_off_alt120

chat_bubble_outline5

repeat13

shareShare

Towards Data Science

@tdatascience

a year ago

Explore the challenges around achieving zero-shot localization with CLIP-style encoders in Ruth Crasto's newest article. #MachineLearning #ComputerVision towardsdatascience.com/zero-shot-loca…

thumb_up_off_alt9

chat_bubble_outline0

repeat3

shareShare

Edson Araujo

@edsonroteia

4 months ago

🚀 Excited to announce our #CVPR2025 paper: CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment! We introduce a simple yet effective method for improved audio-visual learning. 🔗 Project: edsonroteia.github.io/cav-mae-sync/ 🧵 (1/7)👇

thumb_up_off_alt10

chat_bubble_outline1

repeat2

shareShare