Tony Wu (@tonywu_71) 's Twitter Profile
Tony Wu

@tonywu_71

Using my TFLOPS for RAG in Vision Space | ColPali co-first author 📝 | @centralesupelec 🇫🇷 x @Cambridge_Uni 🇬🇧 | @illuintech 🧑🏻‍💻

ID: 1492547583830634510

linkhttps://tonywu71.notion.site/Hi-I-m-Tony-e937d2baf5ab4669904b04fd24513499?pvs=74 calendar_today12-02-2022 17:15:02

231 Tweet

1,1K Takipçi

270 Takip Edilen

Ravid Shwartz Ziv (@ziv_ravid) 's Twitter Profile Photo

Twitter! I can't believe no one told me that people are using vision encoders to retrieve document information these days. Based on a tip from Nadav Timor I read the "ColPali: Efficient Document Retrieval with Vision Language Models" paper, and it is very cool.

Adam Tauman Kalai (@adamfungi) 's Twitter Profile Photo

New research explains why LLMs hallucinate, through a connection between supervised and self-supervised learning. We also describe a key obstacle that can be removed to reduce them. 🧵openai.com/index/why-lang…

Jo Kristian Bergum (@jobergum) 's Twitter Profile Photo

«Don’t use chatbots as search engines was great advice for several years... until it wasn’t.» LLMs are incredibly good at using search as a tool, and why we at hornet.dev are building for this new user of search.

Leonie (@helloiamleonie) 's Twitter Profile Photo

I vibe coded a visual PDF search app with ColQwen2. This is how it works: - Store PDF files as images in a Weaviate vector database vector database - Embed images and text with a multimodal late-interaction model (ColQwen2) - Generate token-wise (and summed) similarity maps to highlight

Thinking Machines (@thinkymachines) 's Twitter Profile Photo

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference”

We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
Qwen (@alibaba_qwen) 's Twitter Profile Photo

🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here! 🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!) 🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed &

🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here!

🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!)
🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed &
Clémentine Fourrier 🍊 (@clefourrier) 's Twitter Profile Photo

Wanna upgrade your agent game? With AI at Meta , we're releasing 2 incredibly cool artefacts: - GAIA 2: assistant evaluation with a twist (new: adaptability, robustness to failure & time sensitivity) - ARE, an agent research environment to empower all! huggingface.co/blog/gaia2

Manuel Faysse (@manuelfaysse) 's Twitter Profile Photo

MetaEmbed is a cool new paper by Zilin Xiao in which we append extra writeable "memory tokens" at the end of ColPali tokens and only store and use those for Late Interaction. This reduces the memory footprint, yet retains rich query/doc granular interaction that scale well

MetaEmbed is a cool new paper by <a href="/ZilinXiao2/">Zilin Xiao</a> in which we append extra writeable "memory tokens" at the end of ColPali tokens  and only store and use those for Late Interaction.  This reduces the memory footprint, yet retains rich query/doc granular interaction that scale well
Qwen (@alibaba_qwen) 's Twitter Profile Photo

🚀 We're thrilled to unveil Qwen3-VL — the most powerful vision-language model in the Qwen series yet! 🔥 The flagship model Qwen3-VL-235B-A22B is now open-sourced and available in both Instruct and Thinking versions: ✅ Instruct outperforms Gemini 2.5 Pro on key vision

🚀 We're thrilled to unveil Qwen3-VL — the most powerful vision-language model in the Qwen series yet!

🔥 The flagship model Qwen3-VL-235B-A22B is now open-sourced and available in both Instruct and Thinking versions:  
✅ Instruct outperforms Gemini 2.5 Pro on key vision
Omar Khattab (@lateinteraction) 's Twitter Profile Photo

Your periodic reminder that late interaction isn’t “awesome but takes a lot of space” as I see here often. ColBERT vectors are often 10 bytes each. Ten bytes. That’s like 3-4 floats. It’s about *interactions* (aka ~attention) not “many vectors”. It’s not “many vectors work

merve (@mervenoyann) 's Twitter Profile Photo

alrighty, publicly sharing my slide deck for multimodal AI, covering ⤵️ > trends & uses > cool open-source models > tools to customize/deploy multimodal models > further resources all models in this presentation are on Hugging Face, easy load with 2 LoC!

alrighty, publicly sharing my slide deck for multimodal AI, covering ⤵️
&gt; trends &amp; uses
&gt; cool open-source models
&gt; tools to customize/deploy multimodal models
&gt; further resources

all models in this presentation are on <a href="/huggingface/">Hugging Face</a>, easy load with 2 LoC!
Manuel Faysse (@manuelfaysse) 's Twitter Profile Photo

The new DeepSeek 3.2 attention mechanism is basically the well known retrieve+rerank paradigm we love in IR. Approximate the top k with something fast but low precision, then refine the top k to get bounded macro complexity.

The new DeepSeek 3.2 attention mechanism is basically the well known retrieve+rerank paradigm we love in IR. Approximate the top k with something fast but low precision, then refine the top k to get bounded macro complexity.
tomaarsen (@tomaarsen) 's Twitter Profile Photo

We're announcing a new update to MTEB: RTEB It's a new multilingual text embedding retrieval benchmark with private (!) datasets, to ensure that we measure true generalization and avoid (accidental) overfitting. Details in our blogpost below 🧵

We're announcing a new update to MTEB: RTEB

It's a new multilingual text embedding retrieval benchmark with private (!) datasets, to ensure that we measure true generalization and avoid (accidental) overfitting.

Details in our blogpost below 🧵
Tony Wu (@tonywu_71) 's Twitter Profile Photo

Really nice work as a follow-up for ColPali (multimodal document retrieval) with super interesting ablations. Congrats guys!