Tony Wu (@tonywu_71) Twitter Tweets • TwiCopy

Tony Wu

@tonywu_71

+ Follow

Using my TFLOPS for RAG in Vision Space | ColPali co-first author 📝 | @centralesupelec 🇫🇷 x @Cambridge_Uni 🇬🇧 | @illuintech 🧑🏻‍💻

ID: 1492547583830634510

linkhttps://tonywu71.notion.site/Hi-I-m-Tony-e937d2baf5ab4669904b04fd24513499?pvs=74 calendar_today12-02-2022 17:15:02

231 Tweet

1,1K Followers

270 Following

Ravid Shwartz Ziv

@ziv_ravid

7 months ago

Twitter! I can't believe no one told me that people are using vision encoders to retrieve document information these days. Based on a tip from Nadav Timor I read the "ColPali: Efficient Document Retrieval with Vision Language Models" paper, and it is very cool.

thumb_up_off_alt45

chat_bubble_outline3

repeat6

shareShare

Adam Tauman Kalai

@adamfungi

3 months ago

New research explains why LLMs hallucinate, through a connection between supervised and self-supervised learning. We also describe a key obstacle that can be removed to reduce them. 🧵openai.com/index/why-lang…

thumb_up_off_alt1,1K

chat_bubble_outline83

repeat324

shareShare

Jo Kristian Bergum

@jobergum

3 months ago

«Don’t use chatbots as search engines was great advice for several years... until it wasn’t.» LLMs are incredibly good at using search as a tool, and why we at hornet.dev are building for this new user of search.

thumb_up_off_alt80

chat_bubble_outline6

repeat5

shareShare

Leonie

@helloiamleonie

3 months ago

I vibe coded a visual PDF search app with ColQwen2. This is how it works: - Store PDF files as images in a Weaviate vector database vector database - Embed images and text with a multimodal late-interaction model (ColQwen2) - Generate token-wise (and summed) similarity maps to highlight

thumb_up_off_alt396

chat_bubble_outline17

repeat62

shareShare

Thomas Wolf

@thom_wolf

3 months ago

oh my god this is just crazy folks

thumb_up_off_alt1,1K

chat_bubble_outline34

repeat115

shareShare

Vaibhav (VB) Srivastav

@reach_vb

3 months ago

Qwen on a rolllll! 🔥

thumb_up_off_alt141

chat_bubble_outline5

repeat10

shareShare

Thinking Machines

@thinkymachines

3 months ago

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to

thumb_up_off_alt6,6K

chat_bubble_outline205

repeat1,1K

shareShare

Qwen

@alibaba_qwen

3 months ago

🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here! 🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!) 🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed &

thumb_up_off_alt3,3K

chat_bubble_outline134

repeat544

shareShare

Clémentine Fourrier 🍊

@clefourrier

2 months ago

Wanna upgrade your agent game? With AI at Meta , we're releasing 2 incredibly cool artefacts: - GAIA 2: assistant evaluation with a twist (new: adaptability, robustness to failure & time sensitivity) - ARE, an agent research environment to empower all! huggingface.co/blog/gaia2

thumb_up_off_alt74

chat_bubble_outline1

repeat17

shareShare

Manuel Faysse

@manuelfaysse

2 months ago

MetaEmbed is a cool new paper by Zilin Xiao in which we append extra writeable "memory tokens" at the end of ColPali tokens and only store and use those for Late Interaction. This reduces the memory footprint, yet retains rich query/doc granular interaction that scale well

MetaEmbed is a cool new paper by <a href="/ZilinXiao2/">Zilin Xiao</a> in which we append extra writeable "memory tokens" at the end of ColPali tokens and only store and use those for Late Interaction. This reduces the memory footprint, yet retains rich query/doc granular interaction that scale well

thumb_up_off_alt64

chat_bubble_outline2

repeat7

shareShare

Qwen

@alibaba_qwen

2 months ago

🚀 We're thrilled to unveil Qwen3-VL — the most powerful vision-language model in the Qwen series yet! 🔥 The flagship model Qwen3-VL-235B-A22B is now open-sourced and available in both Instruct and Thinking versions: ✅ Instruct outperforms Gemini 2.5 Pro on key vision

thumb_up_off_alt1,1K

chat_bubble_outline80

repeat312

shareShare

Omar Khattab

@lateinteraction

2 months ago

Your periodic reminder that late interaction isn’t “awesome but takes a lot of space” as I see here often. ColBERT vectors are often 10 bytes each. Ten bytes. That’s like 3-4 floats. It’s about *interactions* (aka ~attention) not “many vectors”. It’s not “many vectors work

thumb_up_off_alt135

chat_bubble_outline4

repeat17

shareShare

merve

@mervenoyann

2 months ago

alrighty, publicly sharing my slide deck for multimodal AI, covering ⤵️ > trends & uses > cool open-source models > tools to customize/deploy multimodal models > further resources all models in this presentation are on Hugging Face, easy load with 2 LoC!

thumb_up_off_alt207

chat_bubble_outline7

repeat32

shareShare

Manuel Faysse

@manuelfaysse

2 months ago

The new DeepSeek 3.2 attention mechanism is basically the well known retrieve+rerank paradigm we love in IR. Approximate the top k with something fast but low precision, then refine the top k to get bounded macro complexity.

thumb_up_off_alt31

chat_bubble_outline1

repeat3

shareShare

tomaarsen

@tomaarsen

2 months ago

We're announcing a new update to MTEB: RTEB It's a new multilingual text embedding retrieval benchmark with private (!) datasets, to ensure that we measure true generalization and avoid (accidental) overfitting. Details in our blogpost below 🧵

thumb_up_off_alt136

chat_bubble_outline9

repeat24

shareShare

Tony Wu

@tonywu_71

2 months ago

Really nice work as a follow-up for ColPali (multimodal document retrieval) with super interesting ablations. Congrats guys!

thumb_up_off_alt32

chat_bubble_outline0

repeat5

shareShare