Matthew Leavitt (@leavittron) Twitter Tweets • TwiCopy

Matthew Leavitt

@leavittron

+ Follow

Chief Science Officer, Co-Founder @datologyai. Former: Head of Data Research @MosaicML; FAIR. 🧠 and 🤖 intelligence // views are from nowhere

ID: 269994694

linkhttp://mleavitt.net calendar_today21-03-2011 20:36:34

2,2K Tweet

2,2K Followers

890 Following

Matthew Leavitt

@leavittron

6 months ago

That's why you need DatologyAI

thumb_up_off_alt15

chat_bubble_outline0

repeat3

shareShare

Matthew Leavitt

@leavittron

6 months ago

Agree. Overflowing with integrity

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

The team absolutely crushed it here. They blew away nearly every CLIP baseline, and matched or exceeded SigLIP2—which uses a slew of training algorithm improvements—on a number of benchmarks. USING. DATA. CURATION. ONLY. I’m so proud of Ricardo Monti , Haoli Yin ,

thumb_up_off_alt36

chat_bubble_outline1

repeat11

shareShare

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

5 months ago

Datology CLIP Models DatologyAI releases two SOTA CLIP ViT-B/32 variants: classification-optimized and retrieval-optimized, achieving top results through task-specific data curation alone. Model - ViT-B/32 (86M params), OpenCLIP 2.24.0 - No architecture or training changes -

thumb_up_off_alt57

chat_bubble_outline0

repeat12

shareShare

Matthew Leavitt

@leavittron

5 months ago

This is actually one of the less insane things we've done

thumb_up_off_alt21

chat_bubble_outline1

repeat5

shareShare

Vineeth

@vineethdorna

5 months ago

👀 Thrilled to be part of the DatologyAI team! Can't wait to start contributing to our mission of high quality data curation!

thumb_up_off_alt16

chat_bubble_outline0

repeat1

shareShare

Matthew Leavitt

@leavittron

5 months ago

So excited to have you!!

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Ari Morcos

@arimorcos

5 months ago

Congratulations to our friends and partners Arcee.ai on the release of AFM-4.5B! With data powered by DatologyAI, this model outperforms Gemma3-4B and is competitive with Qwen3-4B despite being trained on a fraction of the data.

thumb_up_off_alt46

chat_bubble_outline0

repeat11

shareShare

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

5 months ago

AFM - Arcee Foundation Models. Built from scratch for enterprise. The first release, AFM-4.5B, is a 4.5B open-weight model that runs anywhere: cloud, edge, or CPU. Trained on rigorously filtered data with full deployment flexibility. I don’t say this lightly ⮕ 𝑫𝑶𝑵’𝑻

thumb_up_off_alt80

chat_bubble_outline1

repeat17

shareShare

Bogdan Gaza

@hurrycane

5 months ago

We've definitely seen signs of this already — perhaps not surprisingly, post-training people tend to care more about the value of data. We see a number of companies turning to DatologyAI for getting the most out of their existing datasets!

thumb_up_off_alt16

chat_bubble_outline0

repeat3

shareShare

Pratyush Maini

@pratyushmaini

5 months ago

One of the dreams when joining DatologyAI was to bring the fruits of data research from labs🔬 to the real world 🌎 Soo gratifying to see that our algorithms are out in the open, enabling companies to rival the sophisticated Qwen & Gemma families at a fraction of the cost!

thumb_up_off_alt52

chat_bubble_outline0

repeat4

shareShare

Matthew Leavitt

@leavittron

5 months ago

Partnering w/ Arcee.ai was a blast. This is the first public language model pretrained on DatologyAI-curated data, and we're pleased (though not surprised) that it goes toe-to-toe w/ the best small models. The base model is 🔥 and Arcee's post-training expertise (S-tier) really

Partnering w/ <a href="/arcee_ai/">Arcee.ai</a> was a blast. This is the first public language model pretrained on <a href="/datologyai/">DatologyAI</a>-curated data, and we're pleased (though not surprised) that it goes toe-to-toe w/ the best small models. The base model is 🔥 and Arcee's post-training expertise (S-tier) really

thumb_up_off_alt57

chat_bubble_outline1

repeat9

shareShare

Matthew Leavitt

@leavittron

5 months ago

It's crazy what you can do with good data It's crazy what you can do with DatologyAI

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Max Azoury

@maxwellazoury

5 months ago

kalomaze Aside from the "big guys" (Gemma3, Llama 3.3, Qwen-MAX), Arcee models have ALWAYS, ALWAYS been the best in terms of not being lobotomites. I know kalomaze works with Prime Intellect (who has used Arcee)…but people need to understand, Arcee is the GOAT of posttraining. And their

thumb_up_off_alt61

chat_bubble_outline2

repeat4

shareShare