Isabelle Mohr (@isabelle_mohr) Twitter Tweets • TwiCopy

Isabelle Mohr

@isabelle_mohr

+ Follow

MLE @JinaAI_ 🤖 Interested in all things Machine Learning!

ID:1518959390564524032

calendar_today26-04-2022 14:25:23

52 Tweets

114 Followers

167 Following

Isabelle Mohr

3 weeks ago

This week I explored chunking methods: the Semantic Chunker from LlamaIndex 🦙 on jinaai/wikisections dataset on Hugging Face. Varying the buffer size had pretty much no effect, while increasing the breakpoint percentile threshold increased chunking precision by a lot! Jina AI

This week I explored chunking methods: the Semantic Chunker from @llama_index on jinaai/wikisections dataset on Hugging Face. Varying the buffer size had pretty much no effect, while increasing the breakpoint percentile threshold increased chunking precision by a lot! @JinaAI_

thumb_up_off_alt8

chat_bubble_outline0

account_circle

Isabelle Mohr

3 weeks ago

Last night I had the pleasure of giving a talk together with Saba Sturua at the Data Meetup Berlin hosted by Netlight! Love the knowledge sharing, and most importantly, to connect with so many passionate and interested people in the field. See you at the next one!

#embeddings

Last night I had the pleasure of giving a talk together with @jupyterjazz at the Data Meetup Berlin hosted by Netlight! Love the knowledge sharing, and most importantly, to connect with so many passionate and interested people in the field. See you at the next one! #embeddings

thumb_up_off_alt3

chat_bubble_outline0

account_circle

Isabelle Mohr

1 month ago

I'll be giving a talk together with Saba Sturua next week in Berlin about our German-English bilingual embedding model. If you wanna know how we trained this model and how to use it in a RAG pipeline, you better RSVP and attend! See ya there 🚀
meetup.com/data-meetup-be…
Jina AI

thumb_up_off_alt7

chat_bubble_outline0

account_circle

Bo

2 months ago

A ColBERT variant, but support a bit longer context :)

huggingface.co/jinaai/jina-co…

cc Jo Kristian Bergum Omar Khattab Benjamin Clavié

thumb_up_off_alt148

chat_bubble_outline0

account_circle

Benjamin Clavié

2 months ago

Saw Jina AI's excellent long context (8192!) ColBERT earlier today? Eager to give long-document ColBERT a shot?

New joint🫅colbert-ai and🪤RAGatouille release now supports any maximum length the underlying model can handle (& dynamically adjusts maxlen when encoding in-memory)

Saw @JinaAI_'s excellent long context (8192!) ColBERT earlier today? Eager to give long-document ColBERT a shot? New joint🫅colbert-ai and🪤RAGatouille release now supports any maximum length the underlying model can handle (& dynamically adjusts maxlen when encoding in-memory)

thumb_up_off_alt85

chat_bubble_outline0

account_circle

Bo

3 months ago

Jina AI embeddings now supported by sbert 2.3.0 :)

@JinaAI_ embeddings now supported by sbert 2.3.0 :)

thumb_up_off_alt8

chat_bubble_outline0

account_circle

Xenova

3 months ago

A few days ago, Jina AI released two new bilingual embedding models (German-English & Chinese-English), each supporting a max sequence length of 8K tokens! 🤯

... and now you can use them with 🤗 Transformers.js, for cross-language retrieval, clustering, and so much more! 👇

A few days ago, @JinaAI_ released two new bilingual embedding models (German-English & Chinese-English), each supporting a max sequence length of 8K tokens! 🤯 ... and now you can use them with 🤗 Transformers.js, for cross-language retrieval, clustering, and so much more! 👇

thumb_up_off_alt37

chat_bubble_outline0

account_circle

S

@CapricaReloaded

3 months ago

After many months, we have our first bilingual embedding models ready 🎉😭

German-English: huggingface.co/jinaai/jina-em…
Chinese-English: huggingface.co/jinaai/jina-em…

We made a bunch of evaluations (post coming) and we're convinced bilingual tops multilingual

Looking forward to feedback!

thumb_up_off_alt7

chat_bubble_outline0

account_circle

Bo

3 months ago

We’re finally here with 2 new models, we call it bilingual embedding models, it allows you to perform monolingual and cross-lingual retrieval tasks, the future models are always X+EN, X is the main language and EN as the bridging language. Here are the first two:

German-English…

thumb_up_off_alt99

chat_bubble_outline0

account_circle

Michael Günther

3 months ago

Our German-English and Chinese English embedding models are open-source now 🚀
huggingface.co/jinaai/jina-em…
huggingface.co/jinaai/jina-em…

Our German-English and Chinese English embedding models are open-source now 🚀 huggingface.co/jinaai/jina-em… huggingface.co/jinaai/jina-em…

thumb_up_off_alt19

chat_bubble_outline0

account_circle

Isabelle Mohr

3 months ago

🚀 Exciting News! 🌐 We just released our groundbreaking bilingual embedding models! 🤩🔓 They're free to use, download, and finetune from HF. Find them here: jinaai/jina-embeddings-v2-base-de
jinaai/jina-embeddings-v2-base-zh #OpenSource #NLP #BilingualEmbeddings #AIInnovation

thumb_up_off_alt6

chat_bubble_outline0

account_circle

Isabelle Mohr

4 months ago

So excited to be joining this meetup and talking about Jina Embeddings V2 ✨🥳

thumb_up_off_alt6

chat_bubble_outline0

account_circle

Jina AI

4 months ago

Learn the history of text embeddings with our exclusive infographic poster, illustrating the groundbreaking evolution over the last 74 years. jina.ai/news/the-1950-…

🔍 Educational & Insightful - A timeline that offers a detailed look into the advancements from Bag of Words to…

Learn the history of text embeddings with our exclusive infographic poster, illustrating the groundbreaking evolution over the last 74 years. jina.ai/news/the-1950-… 🔍 Educational & Insightful - A timeline that offers a detailed look into the advancements from Bag of Words to…

thumb_up_off_alt48

chat_bubble_outline0

account_circle

Isabelle Mohr

6 months ago

Here we go!!🚀Jina Embeddings v2 model is doing really well, and it gets even better when we use the bge-reranker-large model (with a hit rate of 0.94 and an MRR of 0.87). Reranking makes a big difference!

thumb_up_off_alt5

chat_bubble_outline0

account_circle

Isabelle Mohr

6 months ago

Our Embedding API is live! 🚀

thumb_up_off_alt4

chat_bubble_outline0

account_circle

Han Xiao

6 months ago

How did we beat OpenAI's text-embedding-ada002 on 8K token length? When and why 8K token length matters to embeddings? Read our paper released today arxiv.org/abs/2310.19923

How did we beat OpenAI's text-embedding-ada002 on 8K token length? When and why 8K token length matters to embeddings? Read our paper released today arxiv.org/abs/2310.19923

thumb_up_off_alt144

chat_bubble_outline0

account_circle

Isabelle Mohr

6 months ago

We collected all our insights about our embedding models with extra long context length!

This paper takes you all the way, from the training process to the evaluation on long texts. Take a look 👀

🤗huggingface.co/jinaai/jina-em…

thumb_up_off_alt3

chat_bubble_outline0

account_circle

fpc ok :)