Walid BOUSSELHAM (@bousselhamwalid) Twitter Tweets • TwiCopy

AK

a year ago

Google DeepMind announces Vision-Language Models as a Source of Rewards paper page: huggingface.co/papers/2312.09… Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting

thumb_up_off_alt576

chat_bubble_outline5

repeat129

shareShare

Hilde Kuehne

@hildekuehne

a year ago

Monika Wysoczańska Super cool work! We found something similar to the idea of pooling for GEM … github.com/WalBouss/GEM … might be interesting to think about how to merge… overall it looks like an exciting topic! Thanks for sharing 🤩!

thumb_up_off_alt8

chat_bubble_outline1

repeat2

shareShare

Lucas Beyer (bl16)

@giffmana

10 months ago

super-TL;DR: revival of foveations in vision. They also create a new benchmark for VLMs which is another thing we really need. This looks pretty cool, and after some thinking in 2023, it's also a direction I'm eager to explore in 2024. Can't wait to fully dig into this paper!

thumb_up_off_alt97

chat_bubble_outline4

repeat9

shareShare

fly51fly

@fly51fly

10 months ago

[CV] EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models arxiv.org/abs/2401.11739 This paper presents a method for image segmentation using diffusion models. By extracting semantic knowledge from a pre-trained diffusion model, fine-grained segmentation maps

thumb_up_off_alt8

chat_bubble_outline0

repeat3

shareShare

Omar Sanseviero

@osanseviero

10 months ago

Nice blog post: makeMoE This is a very cool blog post that implements a MoE block + router, discusses initialization and provides very intuitive explanations of how they work huggingface.co/blog/AviSoori1…

thumb_up_off_alt280

chat_bubble_outline2

repeat47

shareShare

AK

@_akhaliq

10 months ago

pix2gestalt: Amodal Segmentation by Synthesizing Wholes paper page: huggingface.co/papers/2401.14… synthesizes whole objects from only partially visible ones, enabling amodal segmentation, recognition, and 3D reconstruction of occluded objects

thumb_up_off_alt235

chat_bubble_outline0

repeat38

shareShare

Alexander Visheratin

@visheratin

9 months ago

VLMs have a resolution problem, which prevents them from finding small details in large images. In this Hugging Face community post, I discuss the ways to solve it and describe the details of MC-LLaVA architecture: huggingface.co/blog/visherati…

thumb_up_off_alt153

chat_bubble_outline5

repeat39

shareShare

Hilde Kuehne

@hildekuehne

9 months ago

Did you know that a pretrained VL foundation model is all you need for open vocabulary localization and segmentation ? ❗️No training needed❗️Just get your favorite model and localize everything! 💎 Pro-tip: try meta-clip All you need to know is below 👇 Grounding x.com/vittoferraricv…

thumb_up_off_alt28

chat_bubble_outline0

repeat1

shareShare

Vittorio Ferrari

@vittoferraricv

9 months ago

Paper accepted to #CVPR2024! Grounding Everything: Emerging Localization Properties in Vision-Language Transformers Paper: arxiv.org/abs/2312.00878 Demo:huggingface.co/spaces/WalidBo… Code: github.com/WalBouss/GEM With Walid BOUSSELHAM, Felix Petersen, Hilde Kuehne

thumb_up_off_alt127

chat_bubble_outline1

repeat19

shareShare

Hadi Pouransari

@hpouransari

9 months ago

Are you interested in SOTA compact CLIP models? 🚀🚀 Check out our open-sourced repo for a family of MobileCLIP models, including a ViT-B@224 with 77.2% IN-top1 accuracy. More highlights in 🧵 Paper (appearing in CVPR 2024): arxiv.org/abs/2311.17049 Repo: github.com/apple/ml-mobil…

thumb_up_off_alt193

chat_bubble_outline1

repeat54

shareShare

XuDong Wang

@xdwang101

9 months ago

🚀 Excited to share InstanceDiffusion @CVPR2024! It adds precise instance-level control for image gen: free-form text conditions per instance and diverse location specs—points, scribbles, boxes & instance masks Code: shorturl.at/dtxSW arXiv: shorturl.at/rQS14 1/n

thumb_up_off_alt150

chat_bubble_outline3

repeat22

shareShare

Angie Boggust

@angie_boggust

7 months ago

I had so much fun working on LeGrad with Walid BOUSSELHAM, Hendrik Strobelt, and Hilde Kuehne! Demo it here 💻 huggingface.co/spaces/WalidBo…

thumb_up_off_alt16

chat_bubble_outline0

repeat6

shareShare

Cohere For AI

@cohereforai

6 months ago

Join our community-led Geo-Regional Asia Group on Monday, May 27th as they welcome Walid BOUSSELHAM , PhD student at Bonn University, for a presentation on "An Explainability Method for Vision Transformers via Feature Formation Sensitivity." Learn more: cohere.com/events/c4ai-Wa…

Join our community-led Geo-Regional Asia Group on Monday, May 27th as they welcome <a href="/BousselhamWalid/">Walid BOUSSELHAM</a> , PhD student at Bonn University, for a presentation on "An Explainability Method for Vision Transformers via Feature Formation Sensitivity."

Learn more: cohere.com/events/c4ai-Wa…

thumb_up_off_alt10

chat_bubble_outline1

repeat3

shareShare

Cohere For AI

@cohereforai

6 months ago

Be sure to tune in Monday for our community-led event with speaker Walid BOUSSELHAM , who will be presenting "An Explainability Method for Vision Transformers via Feature Formation Sensitivity." 💻

thumb_up_off_alt9

chat_bubble_outline0

repeat3

shareShare

Vittorio Ferrari

@vittoferraricv

5 months ago

Come to poster 354 at #CVPR2024's to see our work! 10:30am today, Arch 4A-E "Grounding Everything: Emerging Localization Properties in Vision-Language Transformers" Paper: arxiv.org/abs/2312.00878 Demo:huggingface.co/spaces/WalidBo… Code: github.com/WalBouss/GEM

thumb_up_off_alt32

chat_bubble_outline0

repeat9

shareShare

Felix Petersen

@fhkpetersen

5 months ago

Today, we’re demonstrating GEM💎: Grounding Everything (in particular grounding VLM transformers) at Demo #13 at #CVPR2024 Walid BOUSSELHAM Hilde Kuehne Vittorio Ferrari github.com/WalBouss/GEM

Today, we’re demonstrating GEM💎: Grounding Everything (in particular grounding VLM transformers) at Demo #13 at <a href="/CVPR/">#CVPR2024</a>
<a href="/BousselhamWalid/">Walid BOUSSELHAM</a> <a href="/HildeKuehne/">Hilde Kuehne</a> <a href="/VittoFerrariCV/">Vittorio Ferrari</a> github.com/WalBouss/GEM

thumb_up_off_alt19

chat_bubble_outline1

repeat8

shareShare

Hilde Kuehne

@hildekuehne

4 months ago

🚨 New Paper Alert! 🚨 *Mask Inversion* What it does? -> It learns a token representation for a specific region in an image (e.g. a token for a mask, a BBox etc.) project page: walidbousselham.com/MaskInversion/ paper: arxiv.org/abs/2407.20034 code: github.com/WalBouss/MaskI…

thumb_up_off_alt172

chat_bubble_outline4

repeat38

shareShare

Walid BOUSSELHAM

@bousselhamwalid

4 months ago

🚀 Excited to share our latest research, "MaskInversion: Localized Embeddings via Optimization of Explainability Maps"! Discover how we enhance vision-language models like CLIP for precise image region representation without fine-tuning the underlying foundation model.

thumb_up_off_alt17

chat_bubble_outline0

repeat3

shareShare

Nina Shvetsova

@ninashv__

2 months ago

🚀We release the dataset, code, models for our "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale," presented at #ECCV2024!🎥📚 🔗Github: github.com/ninatu/howtoca…🔗arXiv: arxiv.org/abs/2310.04900 Anna Kukleva Xudong Hong will go to #NAACL2024 Christian Rupprecht Bernt Schiele Hilde Kuehne

thumb_up_off_alt120

chat_bubble_outline5

repeat14

shareShare