Isabelle Mohr(@isabelle_mohr) 's Twitter Profileg
Isabelle Mohr

@isabelle_mohr

MLE @JinaAI_ 🤖 Interested in all things Machine Learning!

ID:1518959390564524032

calendar_today26-04-2022 14:25:23

52 Tweets

114 Followers

167 Following

Isabelle Mohr(@isabelle_mohr) 's Twitter Profile Photo

This week I explored chunking methods: the Semantic Chunker from LlamaIndex 🦙 on jinaai/wikisections dataset on Hugging Face. Varying the buffer size had pretty much no effect, while increasing the breakpoint percentile threshold increased chunking precision by a lot! Jina AI

This week I explored chunking methods: the Semantic Chunker from @llama_index on jinaai/wikisections dataset on Hugging Face. Varying the buffer size had pretty much no effect, while increasing the breakpoint percentile threshold increased chunking precision by a lot! @JinaAI_
account_circle
Isabelle Mohr(@isabelle_mohr) 's Twitter Profile Photo

Last night I had the pleasure of giving a talk together with Saba Sturua at the Data Meetup Berlin hosted by Netlight! Love the knowledge sharing, and most importantly, to connect with so many passionate and interested people in the field. See you at the next one!

Last night I had the pleasure of giving a talk together with @jupyterjazz at the Data Meetup Berlin hosted by Netlight! Love the knowledge sharing, and most importantly, to connect with so many passionate and interested people in the field. See you at the next one! #embeddings
account_circle
Isabelle Mohr(@isabelle_mohr) 's Twitter Profile Photo

I'll be giving a talk together with Saba Sturua next week in Berlin about our German-English bilingual embedding model. If you wanna know how we trained this model and how to use it in a RAG pipeline, you better RSVP and attend! See ya there 🚀
meetup.com/data-meetup-be…
Jina AI

account_circle
Benjamin Clavié(@bclavie) 's Twitter Profile Photo

Saw Jina AI's excellent long context (8192!) ColBERT earlier today? Eager to give long-document ColBERT a shot?

New joint🫅colbert-ai and🪤RAGatouille release now supports any maximum length the underlying model can handle (& dynamically adjusts maxlen when encoding in-memory)

Saw @JinaAI_'s excellent long context (8192!) ColBERT earlier today? Eager to give long-document ColBERT a shot? New joint🫅colbert-ai and🪤RAGatouille release now supports any maximum length the underlying model can handle (& dynamically adjusts maxlen when encoding in-memory)
account_circle
Xenova(@xenovacom) 's Twitter Profile Photo

A few days ago, Jina AI released two new bilingual embedding models (German-English & Chinese-English), each supporting a max sequence length of 8K tokens! 🤯

... and now you can use them with 🤗 Transformers.js, for cross-language retrieval, clustering, and so much more! 👇

A few days ago, @JinaAI_ released two new bilingual embedding models (German-English & Chinese-English), each supporting a max sequence length of 8K tokens! 🤯 ... and now you can use them with 🤗 Transformers.js, for cross-language retrieval, clustering, and so much more! 👇
account_circle
S(@CapricaReloaded) 's Twitter Profile Photo

After many months, we have our first bilingual embedding models ready 🎉😭

German-English: huggingface.co/jinaai/jina-em…
Chinese-English: huggingface.co/jinaai/jina-em…

We made a bunch of evaluations (post coming) and we're convinced bilingual tops multilingual

Looking forward to feedback!

account_circle
Bo(@bo_wangbo) 's Twitter Profile Photo

We’re finally here with 2 new models, we call it bilingual embedding models, it allows you to perform monolingual and cross-lingual retrieval tasks, the future models are always X+EN, X is the main language and EN as the bridging language. Here are the first two:

German-English…

account_circle
Michael Günther(@michael_g_u) 's Twitter Profile Photo

Our German-English and Chinese English embedding models are open-source now 🚀
huggingface.co/jinaai/jina-em…
huggingface.co/jinaai/jina-em…

Our German-English and Chinese English embedding models are open-source now 🚀 huggingface.co/jinaai/jina-em… huggingface.co/jinaai/jina-em…
account_circle
Isabelle Mohr(@isabelle_mohr) 's Twitter Profile Photo

🚀 Exciting News! 🌐 We just released our groundbreaking bilingual embedding models! 🤩🔓 They're free to use, download, and finetune from HF. Find them here: jinaai/jina-embeddings-v2-base-de
jinaai/jina-embeddings-v2-base-zh

account_circle
Jina AI(@JinaAI_) 's Twitter Profile Photo

Learn the history of text embeddings with our exclusive infographic poster, illustrating the groundbreaking evolution over the last 74 years. jina.ai/news/the-1950-…

🔍 Educational & Insightful - A timeline that offers a detailed look into the advancements from Bag of Words to…

Learn the history of text embeddings with our exclusive infographic poster, illustrating the groundbreaking evolution over the last 74 years. jina.ai/news/the-1950-… 🔍 Educational & Insightful - A timeline that offers a detailed look into the advancements from Bag of Words to…
account_circle
Isabelle Mohr(@isabelle_mohr) 's Twitter Profile Photo

Here we go!!🚀Jina Embeddings v2 model is doing really well, and it gets even better when we use the bge-reranker-large model (with a hit rate of 0.94 and an MRR of 0.87). Reranking makes a big difference!

account_circle
Han Xiao(@hxiao) 's Twitter Profile Photo

How did we beat OpenAI's text-embedding-ada002 on 8K token length? When and why 8K token length matters to embeddings? Read our paper released today arxiv.org/abs/2310.19923

How did we beat OpenAI's text-embedding-ada002 on 8K token length? When and why 8K token length matters to embeddings? Read our paper released today arxiv.org/abs/2310.19923
account_circle
Isabelle Mohr(@isabelle_mohr) 's Twitter Profile Photo

We collected all our insights about our embedding models with extra long context length!

This paper takes you all the way, from the training process to the evaluation on long texts. Take a look 👀

🤗huggingface.co/jinaai/jina-em…

account_circle