Daniel van Strien
@vanstriendaniel
Machine Learning Librarian @huggingface 🤗 | Championing Open Science & ML | Sharing the latest ML datasets 🌟 | Tips for mastering the HF Hub
ID:2828117077
23-09-2014 13:43:54
3,0K Tweets
2,6K Followers
1,4K Following
Hugging Face PRO or Enterprise user?
Then you can combine distilabel 1.0.0 and Llama 3 recent releases to generate a synthetic text dataset using serverless endpoints 🤗
Did you know that Argilla and distilabel datasets have over 6 million hub downloads on the Hub? 🤯
Now, distilabel datasets will be even easier to identify thanks to the new icon added to the Hugging Face Hub—a nice addition to yesterday's release!
github.com/argilla-io/dis…
Could DPO-style preference data be crucial for enhancing open LLMs across different languages?
Leveraging a 7k preference dataset, Argilla, Hugging Face, and KAIST AI utilized KAIST AI's ORPO technique with the latest Mistral AI MOE model to create a very high-performing
Could DPO-style preference data be crucial for enhancing open LLMs across different languages?
Leveraging a 7k preference dataset, Argilla, Hugging Face, and KAIST AI utilized KAIST AI's ORPO technique with the latest Mistral AI MOE model to create a very high-performing
DS-1000 (ds1000-code-gen.github.io) code generation data format has now been simplified and hosted on Hugging Face datasets.
1⃣Simplified format: github.com/xlang-ai/DS-10…
2⃣DS-1000 Hugging Face: huggingface.co/datasets/xlang…
Credits: Yuhang Lai and Sida Wang