
Pratyush Maini
@pratyushmaini
Data Quality x Privacy | PhD student @mldcmu | Founding Member @datologyai | Prev. Comp Sc @iitdelhi
🦋: bsky.app/profile/pratyu…
ID: 1191440736517939200
http://pratyushmaini.github.io 04-11-2019 19:43:22
562 Tweet
1,1K Followers
418 Following


Launch day! Take a look at the phenomenal insights into what production-grade synthetic data looks like - driven by Pratyush Maini and co




Big day for the DatologyAI team! We introduce BeyondWeb, scaling synthetic data for trillion-scale pretraining! ✨ Collect more → curate better synthetic → win big ✨Not all synthetic data is created equal → doing it right pays off. ✨ With only targeted synthetic data


As we hit the limits of real web-scale data, DatologyAI's synthetic data shows how we can leverage the models we've already trained to squeeze even more value out of limited data! Huge props to Pratyush Maini for leading this work 👏🚀


Great work by the DatologyAI team, enjoyed reading this

Thrilled to see BeyondWeb launched 🚀 Phenomenal insights and a huge step forward for scaling high-quality synthetic data to trillions of tokens. Amazing work by Pratyush Maini and the DatologyAI team - super excited to be learning from you all!

Pratyush Maini and DatologyAI make synthetic data seem easy, but it's really just how good they are



Finalizing the magic seed for synthetic data generation while the data dawg Ricardo Monti showers his blessings in the background.

After months of development, we finally share with the world some hard-earned science behind synthetic data. DatologyAI presents BeyondWeb — a SOTA approach showing how thoughtful synthetic data design can beat strong baselines.

The last two days have been a whirlwind, and I haven’t had a chance to read this end to end - though I did see an early draft - let alone comment. I’m one of the few people outside DatologyAI fortunate enough to have seen these results firsthand, and everyone can experience