Felix (@felix1987_) 's Twitter Profile
Felix

@felix1987_

Senior Software Engineer @JinaAI_

ID: 13790222

calendar_today21-02-2008 21:28:35

1,1K Tweet

396 Followers

819 Following

antoine (@antoinelouis_) 's Twitter Profile Photo

Can hybrid search boost retrieval performance in a non-English, highly specialized domain like law? In my final PhD paper (COLING 2025), I investigate this question by combining various retrieval techniques across two realistic scenarios: 1️⃣ zero-shot, where we assume **no**

Can hybrid search boost retrieval performance in a non-English, highly specialized domain like law?

In my final PhD paper (COLING 2025), I investigate this question by combining various retrieval techniques across two realistic scenarios:
1️⃣ zero-shot, where we assume **no**
Felix (@felix1987_) 's Twitter Profile Photo

Is it another win of Bayes theorem? The EM looks like to add some conditional distribution shift to the prior knowledge (pretrained model).

Han Xiao (@hxiao) 's Twitter Profile Photo

We've got some Research Scientist Intern spots open in the Bay Area Jina AI . Hit me up if you're into Embeddings, Rerankers, or small language models for better search stuff. Would be awesome if you've worked with long context before, or have experience with

Felix (@felix1987_) 's Twitter Profile Photo

We released a new model: jina-embeddings-v4 - an universal embedding model for multimodal and multilingual retrieval. - trained on a wider scope than DSE, ColPali, etc. - support MRL, late interaction, etc. 🤗huggingface.co/jinaai/jina-em… jina.ai/news/jina-embe…

Jina AI (@jinaai_) 's Twitter Profile Photo

Submodular optimization for token/sentence selection from long contexts. Here's an interesting exp: first used jina-embeddings-v4's multi-vector feature to extract token-level embeddings from a passage, then applied submodular optimization to cherry-pick the tokens that provide

Felix (@felix1987_) 's Twitter Profile Photo

Congrats! contextual chunk embedding is undervalued. It is good to see more and more players push it forward after jina-v3 later chunking.

Jina AI (@jinaai_) 's Twitter Profile Photo

New benchmark drops: JinaVDR (Visual Document Retrieval) evals how good retrieval models handle real-world visual documents on 95 tasks in 20 langs—think layouts packed with graphs, charts, tables, text, images. We're talking scanned docs, screenshots, the works. JinaVDR pairs

New benchmark drops: JinaVDR (Visual Document Retrieval) evals how good retrieval models handle real-world visual documents on 95 tasks in 20 langs—think layouts packed with graphs, charts, tables, text, images. We're talking scanned docs, screenshots, the works. JinaVDR pairs
Felix (@felix1987_) 's Twitter Profile Photo

The best paper I have read this year! Key takeaway: ... may reflect a fundamental shift in the era of LLMs, where underfitting becomes less problematic than overfitting.

The best paper I have read this year! 

Key takeaway: ... may reflect a fundamental shift in the era of LLMs, where underfitting becomes less problematic than overfitting.
Michael Günther (@michael_g_u) 's Twitter Profile Photo

We are at Qdrant 's Vector Space Day 🚀 in Berlin on Sep 26. We'll talk about "Vision-Language Models: A New Architecture for Multi-Modal Embedding Models" and also share some insights and learnings we gained while training jina-embeddings-v4. 🎫 lu.ma/p7w9uqtz

We are at <a href="/qdrant_engine/">Qdrant</a> 's Vector Space Day 🚀 in Berlin on Sep 26. We'll talk about "Vision-Language Models: A New Architecture for Multi-Modal Embedding Models" and also share some insights and learnings we gained while training jina-embeddings-v4.
🎫 lu.ma/p7w9uqtz
Jina AI (@jinaai_) 's Twitter Profile Photo

Today we're releasing jina-code-embeddings, a new suite of code embedding models in two sizes—0.5B and 1.5B parameters—along with 1~4bit GGUF quantizations for both. Built on latest code generation LLMs, these models achieve SOTA retrieval performance despite their compact size.

Today we're releasing jina-code-embeddings, a new suite of code embedding models in two sizes—0.5B and 1.5B parameters—along with 1~4bit GGUF quantizations for both. Built on latest code generation LLMs, these models achieve SOTA retrieval performance despite their compact size.