DatologyAI (@datologyai) Twitter Tweets • TwiCopy

DatologyAI

@datologyai

+ Follow

DatologyAI builds tools to automatically select and optimize the best data on which to train AI models, leading to better models which train faster.

ID: 1699892979220283392

linkhttp://www.datologyai.com calendar_today07-09-2023 21:10:50

112 Tweet

1,1K Takipçi

35 Takip Edilen

good girl

@goodgirlxsz

5 hours ago

🔥Telegram İfşa

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Everyone should be able to train and deploy an AI model. DatologyAI uses data curation to reduce training costs and make smaller models better. CEO @AriMorcos explains his team's mission in this talk with Felicis GPs Viviana Faga & Astasia Myers. Full video:

thumb_up_off_alt25

chat_bubble_outline2

repeat5

shareShare

Cody Blakeney

@code_star

8 months ago

I’m super excited to announce I’ve joined DatologyAI I’ll be working with the research team to make the highest quality of data curation available to more than just the frontier labs. I’m also in the Bay Area now! Hit me up if you want to grab coffee or something.

thumb_up_off_alt153

chat_bubble_outline27

repeat10

shareShare

DatologyAI

@datologyai

8 months ago

Join us to curate the best data for ML.

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Lucas Atkins

@lucasatkins7

8 months ago

What an insane get for an insane team. We’ve been working with DatologyAI closely and I assure you if anything they sell themselves way short. They’re the real deal.

thumb_up_off_alt26

chat_bubble_outline0

repeat5

shareShare

Thao Nguyen

@thao_nguyen26

7 months ago

📢 Announcing our data-centric workshop at ICML 2025 on unifying data curation frameworks across domains! 📅 Deadline: May 24, AoE 🔗 Website: dataworldicml2025.github.io We have an amazing lineup of speakers + panelists from various institutions and application areas.

thumb_up_off_alt135

chat_bubble_outline2

repeat21

shareShare

Ari Morcos

@arimorcos

6 months ago

We couldn't agree more. If you also believe this, come work with us DatologyAI to help drive frontier research and engineering in making the best training data possible.

thumb_up_off_alt35

chat_bubble_outline0

repeat4

shareShare

Matthew Leavitt

@leavittron

6 months ago

That's why you need DatologyAI

thumb_up_off_alt15

chat_bubble_outline0

repeat3

shareShare

Ricardo Monti

@ricardomonti9

6 months ago

. DatologyAI is back: state of the art CLIP model performance using data curation alone 🚀 ✅ state-of-the-art ViT-B/32 performance: ImageNet 1k 76.9% vs 74% reported by SigLIP2 ✅ 8x training efficiency gains ✅ 2x inference efficiency gains ✅ Public model release Details in

. <a href="/datologyai/">DatologyAI</a> is back: state of the art CLIP model performance using data curation alone 🚀

✅ state-of-the-art ViT-B/32 performance: ImageNet 1k 76.9% vs 74% reported by SigLIP2
✅ 8x training efficiency gains
✅ 2x inference efficiency gains
✅ Public model release

Details in

thumb_up_off_alt138

chat_bubble_outline3

repeat22

shareShare

Lucas Atkins

@lucasatkins7

6 months ago

. DatologyAI is pushing the frontier, with data curation as its standout advantage. After working closely with the team over the past few months, I’ve seen their dedication, drive, and depth of expertise firsthand. Kudos and congratulations to everyone on the team.

thumb_up_off_alt28

chat_bubble_outline0

repeat5

shareShare

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

6 months ago

Datology CLIP Models DatologyAI releases two SOTA CLIP ViT-B/32 variants: classification-optimized and retrieval-optimized, achieving top results through task-specific data curation alone. Model - ViT-B/32 (86M params), OpenCLIP 2.24.0 - No architecture or training changes -

thumb_up_off_alt57

chat_bubble_outline0

repeat12

shareShare

Lucas Atkins

@lucasatkins7

6 months ago

We teamed up with DatologyAI to build what we believe is the strongest pretraining corpus in the world—and I truly think we nailed it. Their team was absolutely key to the model’s success. We started with ~23T tokens of high-quality data and distilled it down to 6.58T through

thumb_up_off_alt47

chat_bubble_outline1

repeat5

shareShare

Ari Morcos

@arimorcos

6 months ago

Congratulations to our friends and partners Arcee.ai on the release of AFM-4.5B! With data powered by DatologyAI, this model outperforms Gemma3-4B and is competitive with Qwen3-4B despite being trained on a fraction of the data.

thumb_up_off_alt46

chat_bubble_outline0

repeat11

shareShare

DatologyAI

@datologyai

6 months ago

Congrats to Lucas Atkins and Arcee.ai on a fantastic model release! DatologyAI powers the data behind AFM-4.5B, and we're just getting started.

thumb_up_off_alt32

chat_bubble_outline0

repeat3

shareShare

Cody Blakeney

@code_star

4 months ago

We are looking for a post-training lead at DatologyAI we have gpus, you can make them go brrrr

We are looking for a post-training lead at <a href="/datologyai/">DatologyAI</a>

we have gpus, you can make them go brrrr

thumb_up_off_alt115

chat_bubble_outline6

repeat18

shareShare

Cody Blakeney

@code_star

4 months ago

Training efficiency is hard, but getting easier to manage all the time. You can rent high speed interconnected h100s on demand with just a credit card. The biggest single failure mode blocking people training high quality foundation models is data. But it doesn’t have to be. If

thumb_up_off_alt27

chat_bubble_outline1

repeat4

shareShare