DatologyAI (@datologyai) 's Twitter Profile
DatologyAI

@datologyai

DatologyAI builds tools to automatically select and optimize the best data on which to train AI models, leading to better models which train faster.

ID: 1699892979220283392

linkhttp://www.datologyai.com calendar_today07-09-2023 21:10:50

112 Tweet

1,1K Takipçi

35 Takip Edilen

Felicis (@felicis) 's Twitter Profile Photo

Everyone should be able to train and deploy an AI model. DatologyAI uses data curation to reduce training costs and make smaller models better. CEO @AriMorcos explains his team's mission in this talk with Felicis GPs Viviana Faga & Astasia Myers. Full video:

Cody Blakeney (@code_star) 's Twitter Profile Photo

I’m super excited to announce I’ve joined DatologyAI I’ll be working with the research team to make the highest quality of data curation available to more than just the frontier labs. I’m also in the Bay Area now! Hit me up if you want to grab coffee or something.

Lucas Atkins (@lucasatkins7) 's Twitter Profile Photo

What an insane get for an insane team. We’ve been working with DatologyAI closely and I assure you if anything they sell themselves way short. They’re the real deal.

Thao Nguyen (@thao_nguyen26) 's Twitter Profile Photo

📢 Announcing our data-centric workshop at ICML 2025 on unifying data curation frameworks across domains! 📅 Deadline: May 24, AoE 🔗 Website: dataworldicml2025.github.io We have an amazing lineup of speakers + panelists from various institutions and application areas.

📢 Announcing our data-centric workshop at ICML 2025 on unifying data curation frameworks across domains!

📅 Deadline: May 24, AoE
🔗 Website: dataworldicml2025.github.io

We have an amazing lineup of speakers + panelists from various institutions and application areas.
Ari Morcos (@arimorcos) 's Twitter Profile Photo

We couldn't agree more. If you also believe this, come work with us DatologyAI to help drive frontier research and engineering in making the best training data possible.

Ricardo Monti (@ricardomonti9) 's Twitter Profile Photo

. DatologyAI is back: state of the art CLIP model performance using data curation alone 🚀 ✅ state-of-the-art ViT-B/32 performance: ImageNet 1k 76.9% vs 74% reported by SigLIP2 ✅ 8x training efficiency gains ✅ 2x inference efficiency gains ✅ Public model release Details in

. <a href="/datologyai/">DatologyAI</a> is back: state of the art CLIP model performance using data curation alone 🚀

✅ state-of-the-art ViT-B/32 performance: ImageNet 1k 76.9% vs 74% reported by SigLIP2
✅ 8x training efficiency gains
✅ 2x inference efficiency gains
✅ Public model release

Details in
Lucas Atkins (@lucasatkins7) 's Twitter Profile Photo

. DatologyAI is pushing the frontier, with data curation as its standout advantage. After working closely with the team over the past few months, I’ve seen their dedication, drive, and depth of expertise firsthand. Kudos and congratulations to everyone on the team.

𝚐𝔪𝟾𝚡𝚡𝟾 (@gm8xx8) 's Twitter Profile Photo

Datology CLIP Models DatologyAI releases two SOTA CLIP ViT-B/32 variants: classification-optimized and retrieval-optimized, achieving top results through task-specific data curation alone. Model - ViT-B/32 (86M params), OpenCLIP 2.24.0 - No architecture or training changes -

Datology CLIP Models

DatologyAI releases two SOTA CLIP ViT-B/32 variants: classification-optimized and retrieval-optimized, achieving top results through task-specific data curation alone. 

Model
- ViT-B/32 (86M params), OpenCLIP 2.24.0
- No architecture or training changes
-
Lucas Atkins (@lucasatkins7) 's Twitter Profile Photo

We teamed up with DatologyAI to build what we believe is the strongest pretraining corpus in the world—and I truly think we nailed it. Their team was absolutely key to the model’s success. We started with ~23T tokens of high-quality data and distilled it down to 6.58T through

Ari Morcos (@arimorcos) 's Twitter Profile Photo

Congratulations to our friends and partners Arcee.ai on the release of AFM-4.5B! With data powered by DatologyAI, this model outperforms Gemma3-4B and is competitive with Qwen3-4B despite being trained on a fraction of the data.

Cody Blakeney (@code_star) 's Twitter Profile Photo

Training efficiency is hard, but getting easier to manage all the time. You can rent high speed interconnected h100s on demand with just a credit card. The biggest single failure mode blocking people training high quality foundation models is data. But it doesn’t have to be. If