Henry AI Labs (@labs_henry) Twitter Tweets • TwiCopy

Connor Shorten

a month ago

Late Interaction is not only great for inference, but also for training!! 🏭 Fine-tuning single-vector embedding models hasn’t really taken off… Late Interaction could change this. One of my favorite takeaways from the podcast, here is a clip explaining this further in ~1

thumb_up_off_alt28

chat_bubble_outline2

repeat4

shareShare

Amélie Chatelain

@amelietabatta

a month ago

At LightOn, there's a lot of peer pressure to convert into a late interaction enjoyer! But it just makes sense: it's keyword search, evolved. Semantic search lost token-level granularity. Hybrid patches that. Late interaction fixes it natively: same granularity, learned space.

thumb_up_off_alt10

chat_bubble_outline2

repeat5

shareShare

Bob van Luijt

@bobvanluijt

a month ago

🙌 More than 200 AI developers just weighed in via IT Brand Pulse, and Weaviate swept the vector database categories! 🥇 𝗠𝗮𝗿𝗸𝗲𝘁 𝗟𝗲𝗮𝗱𝗲𝗿 🥇 𝗜𝗻𝘁𝗲𝗹𝗹𝗶𝗴𝗲𝗻𝗰𝗲 & 𝗜𝗻𝗻𝗼𝘃𝗮𝘁𝗶𝗼𝗻 𝗟𝗲𝗮𝗱𝗲𝗿 🔗 Read the full breakdown here: itbrandpulse.com/wp-content/upl…

thumb_up_off_alt6

chat_bubble_outline0

repeat3

shareShare

Victoria Slocum

@victorialslocum

a month ago

If you're building a PDF RAG pipeline: Should you be using OCR and 𝘁𝗲𝘅𝘁-𝗯𝗮𝘀𝗲𝗱 𝗿𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 methods, or just 𝗲𝗺𝗯𝗲𝗱 𝗶𝗺𝗮𝗴𝗲𝘀 𝗱𝗶𝗿𝗲𝗰𝘁𝗹𝘆 using late interaction models? This paper says the answer might actually be 𝘣𝘰𝘵𝘩. My colleagues at Weaviate

thumb_up_off_alt748

chat_bubble_outline19

repeat102

shareShare

Connor Shorten

@cshorten30

a month ago

x.com/i/article/2038…

thumb_up_off_alt22

chat_bubble_outline1

repeat8

shareShare

Weaviate Podcast

@weaviatepodcast

a month ago

A recap of Multi-Vector Search 🎙️💚 with Amélie Chatelain and Antoine Chaffin 👇

thumb_up_off_alt9

chat_bubble_outline0

repeat7

shareShare

Connor Shorten

@cshorten30

a month ago

What is the state of Cross Encoder API latency? 🤔 I just ran a quick latency test for Cohere and Voyage's `rerank` APIs. ⚙️ The test sweeps across reranking K = 10, 20, 50, 100, 200, and 500 documents. Each document has an average of ~1,000 tokens. Here are the fastest scores

thumb_up_off_alt20

chat_bubble_outline1

repeat4

shareShare

Weaviate Podcast

@weaviatepodcast

a month ago

You can also keep up with recaps of the Weaviate Podcast on Substack! 📬 👉 weaviatepodcast.substack.com

thumb_up_off_alt8

chat_bubble_outline0

repeat4

shareShare

Antoine Chaffin

@antoine_chaffin

a month ago

Until we have multimodal search at the same level of text search, this is probably the best way of having a searchable format of our discussions! Very thorough summary and very happy to have it to share and go back to!

thumb_up_off_alt38

chat_bubble_outline3

repeat6

shareShare

Connor Shorten

@cshorten30

a month ago

I learned a lot from our discussion of Reason-ModernColBERT and Reasoning-Intensive Retrieval 🧠 Firstly, check out the ReasonIR dataset from Meta if you haven't already! This is an incredible resource for training search models! 🛠️ Secondly, there are two things going on with

thumb_up_off_alt80

chat_bubble_outline1

repeat11

shareShare

Antoine Chaffin

@antoine_chaffin

a month ago

Fine-tuning Reason-ModernColBERT on the AgentIR data and appending reasoning traces boosted the accuracy on BrowseComp-Plus by 10% with OSS btw Great job Zijian Chen

thumb_up_off_alt52

chat_bubble_outline1

repeat10

shareShare

Connor Shorten

@cshorten30

a month ago

Tomorrow, #135 🎙️💚

thumb_up_off_alt15

chat_bubble_outline0

repeat5

shareShare

Weaviate Podcast

@weaviatepodcast

a month ago

Don't miss new episodes of the Weaviate Podcast! Subscribe on YouTube! 👇

thumb_up_off_alt4

chat_bubble_outline1

repeat3

shareShare

Connor Shorten

@cshorten30

a month ago

Hey everyone! I am SUPER EXCITED to publish a new episode of the Weaviate Podcast with Shreya Shankar (Shreya Shankar) on Data Agents! 👾 Shreya is a Ph.D. student in the EPIC Data Lab (UC Berkeley EPIC Lab) advised by Aditya Parameswaran (Aditya Parameswaran) at UC Berkeley. Her research focuses on

Hey everyone! I am SUPER EXCITED to publish a new episode of the Weaviate Podcast with Shreya Shankar (<a href="/sh_reya/">Shreya Shankar</a>) on Data Agents! 👾

Shreya is a Ph.D. student in the EPIC Data Lab (<a href="/UCBEPIC/">UC Berkeley EPIC Lab</a>) advised by Aditya Parameswaran (<a href="/adityagp/">Aditya Parameswaran</a>) at UC Berkeley. Her research focuses on

thumb_up_off_alt35

chat_bubble_outline2

repeat14

shareShare

Weaviate Podcast

@weaviatepodcast

a month ago

Weaviate Podcast #135 is live! Data Agents! 🎙️💚🔥

thumb_up_off_alt6

chat_bubble_outline2

repeat6

shareShare

Weaviate Podcast

@weaviatepodcast

a month ago

What are Data Agents? 👾

thumb_up_off_alt7

chat_bubble_outline1

repeat4

shareShare

Weaviate Podcast

@weaviatepodcast

a month ago

The Data Agent Benchmark measures how well AI Agents can handle complex queries across Multiple Database Systems! 🎯👾

thumb_up_off_alt7

chat_bubble_outline1

repeat4

shareShare

Weaviate Podcast

@weaviatepodcast

a month ago

Why are there so many Databases?!? 👇

thumb_up_off_alt4

chat_bubble_outline1

repeat5

shareShare

Weaviate Podcast

@weaviatepodcast

a month ago

What if you could filter the objects in a database with natural language commands, rather than relying on pre-computed columns and attributes? ✨ This is the idea behind Semantic Operators, also known as AI SQL. 👾 This clip explains this idea furher, one of the most

thumb_up_off_alt10

chat_bubble_outline2

repeat3

shareShare

ℏεsam

@hesamation

a month ago

Dear recruiters, if you are writing a job posting for AI Engineering, here is how long each tool has been available, so you don't make a fool of yourself: TensorFlow: 17 years MCP: 6 years vLLM: 7 years Ollama: 10 years CrewAI: 12 years CUDA: 25 years JAX: 11 years Weaviate: 14

thumb_up_off_alt2,2K

chat_bubble_outline79

repeat101

shareShare