Vineeth (@vineethdorna) Twitter Tweets • TwiCopy

Ari Morcos

5 months ago

Andrej Karpathy This is our exclusive focus DatologyAI. Data quality is the single most underinvested area of ML research relative to its impact. We've already been able to achieve 10x efficiency gains over open-source datasets, and I'm confident there's still another 100x because there's

thumb_up_off_alt90

chat_bubble_outline1

repeat7

shareShare

Matthew Leavitt

@leavittron

5 months ago

It depends on how much you know about what you're using your model for. You want your data to be as similar to your test distribution as possible. In practice, benchmarks are an incomplete description of your true test distribution, so you want to hedge diversity vs.

thumb_up_off_alt22

chat_bubble_outline2

repeat6

shareShare

Vineeth

@vineethdorna

5 months ago

Cool summer for data obsessed engineers. Come join us! 😎

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

sijia.liu

@sijialiu17

5 months ago

🚨 Excited to attend #ICML2025 and share our latest work (OPTML @MSU) on LLM unlearning -- think of it as AI surgery: removing harmful knowledge while preserving general utility. Catch us at: 🔹 [Paper 1] Tues, July 15 @ 4:30pm PT | E-1108 📄 Invariance Makes LLM Unlearning

thumb_up_off_alt38

chat_bubble_outline0

repeat12

shareShare

Vineeth

@vineethdorna

4 months ago

Interesting stuff to grab. Visit our booth if you are at ICML.

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

Matthew Leavitt

@leavittron

4 months ago

I used to claim that the “ML” in ICML stands for “Matthew Leavitt”. But this list of papers makes it hard to deny that the “M” in ICML actually stands for “Maini”

thumb_up_off_alt15

chat_bubble_outline1

repeat2

shareShare

Cody Blakeney

@code_star

4 months ago

We got data themed drinks on tap

thumb_up_off_alt11

chat_bubble_outline0

repeat2

shareShare

Vaidehi Patil

@vaidehi_patil_

4 months ago

The MUGen workshop at #ICML2025 is happening now! Stop by for talks on adversarial ML, unlearning as rational belief revision, failure modes in unlearning, robust LLM unlearning, and the bright vs. dark side of forgetting in generative AI!

thumb_up_off_alt28

chat_bubble_outline1

repeat9

shareShare

Cody Blakeney

@code_star

3 months ago

congrats to Pratyush Maini and Vineeth for finally getting this beast over the finish line! If you also the best data money can buy get in touch with DatologyAI !

thumb_up_off_alt7

chat_bubble_outline0

repeat2

shareShare

Matthew Leavitt

@leavittron

3 months ago

Very excited to announce BeyondWeb, @datologyAI’s synthetic pretraining data generation paradigm. BeyondWeb is a rephrasing-based approach that substantially outperforms existing public synthetic pretraining data baselines, and is a core part of our curation pipeline.

thumb_up_off_alt280

chat_bubble_outline4

repeat41

shareShare

Matthew Leavitt

@leavittron

3 months ago

thumb_up_off_alt21

chat_bubble_outline0

repeat3

shareShare

Vineeth

@vineethdorna

3 months ago

Not just a team, an orchestra. 🎻 21 Datologists in sync. How often do you see this at your org?

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Amro

@amrokamal1997

3 months ago

📖 Blog: blog.datologyai.com/beyondweb 📄 Arxiv: arxiv.org/abs/2508.10975 Shout-out to all the Datologists for this work, especially Pratyush Maini and Vineeth

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

Lucas Atkins

@lucasatkins7

3 months ago

The last two days have been a whirlwind, and I haven’t had a chance to read this end to end - though I did see an early draft - let alone comment. I’m one of the few people outside DatologyAI fortunate enough to have seen these results firsthand, and everyone can experience

thumb_up_off_alt66

chat_bubble_outline0

repeat12

shareShare

DatologyAI

@datologyai

3 months ago

Battle of the AI Models: DatologyAI is thrilled to host an event for #SFTechWeek! RSVP here: lnkd.in/gEAMcj8u Step onto the frontlines of the AI battle with Ari Morcos, CEO of DatologyAI, @LucasAtkins7, CTO of Arcee AI, and Johannes Hagemann, Co-Founder and CTO of Prime

thumb_up_off_alt23

chat_bubble_outline1

repeat7

shareShare

Matthew Leavitt

@leavittron

a month ago

If you were impacted by the recent Meta layoffs (or even if you weren't) and you're interested in doing ambitious, rigorous science and/or engineering that powers a real product that actual customers pay us ca$h money for, please DM me or head over to datologyai.com/careers.

thumb_up_off_alt21

chat_bubble_outline0

repeat10

shareShare

Vineeth

@vineethdorna

a month ago

Absolutely loved this blog! There are so many reasons why a loss might not converge and just as many places one could easily overlook. Analyzing all the possibilities and diving down to the kernel level details is so much fun!

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare