Vineeth (@vineethdorna) 's Twitter Profile
Vineeth

@vineethdorna

ID: 1452315713880920074

calendar_today24-10-2021 16:47:26

3 Tweet

18 Takipçi

218 Takip Edilen

Ari Morcos (@arimorcos) 's Twitter Profile Photo

Andrej Karpathy This is our exclusive focus DatologyAI. Data quality is the single most underinvested area of ML research relative to its impact. We've already been able to achieve 10x efficiency gains over open-source datasets, and I'm confident there's still another 100x because there's

Matthew Leavitt (@leavittron) 's Twitter Profile Photo

It depends on how much you know about what you're using your model for. You want your data to be as similar to your test distribution as possible. In practice, benchmarks are an incomplete description of your true test distribution, so you want to hedge diversity vs.

sijia.liu (@sijialiu17) 's Twitter Profile Photo

🚨 Excited to attend #ICML2025 and share our latest work (OPTML @MSU) on LLM unlearning -- think of it as AI surgery: removing harmful knowledge while preserving general utility. Catch us at: 🔹 [Paper 1] Tues, July 15 @ 4:30pm PT | E-1108 📄 Invariance Makes LLM Unlearning

Matthew Leavitt (@leavittron) 's Twitter Profile Photo

I used to claim that the “ML” in ICML stands for “Matthew Leavitt”. But this list of papers makes it hard to deny that the “M” in ICML actually stands for “Maini”

Vaidehi Patil (@vaidehi_patil_) 's Twitter Profile Photo

The MUGen workshop at #ICML2025 is happening now! Stop by for talks on adversarial ML, unlearning as rational belief revision, failure modes in unlearning, robust LLM unlearning, and the bright vs. dark side of forgetting in generative AI!

Matthew Leavitt (@leavittron) 's Twitter Profile Photo

Very excited to announce BeyondWeb, @datologyAI’s synthetic pretraining data generation paradigm. BeyondWeb is a rephrasing-based approach that substantially outperforms existing public synthetic pretraining data baselines, and is a core part of our curation pipeline.

Very excited to announce BeyondWeb, @datologyAI’s synthetic pretraining data generation paradigm. BeyondWeb is a rephrasing-based approach that substantially outperforms existing public synthetic pretraining data baselines, and is a core part of our curation pipeline.
Amro (@amrokamal1997) 's Twitter Profile Photo

📖 Blog: blog.datologyai.com/beyondweb 📄 Arxiv: arxiv.org/abs/2508.10975 Shout-out to all the Datologists for this work, especially Pratyush Maini and Vineeth

Lucas Atkins (@lucasatkins7) 's Twitter Profile Photo

The last two days have been a whirlwind, and I haven’t had a chance to read this end to end - though I did see an early draft - let alone comment. I’m one of the few people outside DatologyAI fortunate enough to have seen these results firsthand, and everyone can experience

DatologyAI (@datologyai) 's Twitter Profile Photo

Battle of the AI Models: DatologyAI is thrilled to host an event for #SFTechWeek! RSVP here: lnkd.in/gEAMcj8u Step onto the frontlines of the AI battle with Ari Morcos, CEO of DatologyAI, @LucasAtkins7, CTO of Arcee AI, and Johannes Hagemann, Co-Founder and CTO of Prime

Matthew Leavitt (@leavittron) 's Twitter Profile Photo

If you were impacted by the recent Meta layoffs (or even if you weren't) and you're interested in doing ambitious, rigorous science and/or engineering that powers a real product that actual customers pay us ca$h money for, please DM me or head over to datologyai.com/careers.

Vineeth (@vineethdorna) 's Twitter Profile Photo

Absolutely loved this blog! There are so many reasons why a loss might not converge and just as many places one could easily overlook. Analyzing all the possibilities and diving down to the kernel level details is so much fun!