Daniel van Strien(@vanstriendaniel) 's Twitter Profileg
Daniel van Strien

@vanstriendaniel

Machine Learning Librarian @huggingface 🤗 | Championing Open Science & ML | Sharing the latest ML datasets 🌟 | Tips for mastering the HF Hub

ID:2828117077

calendar_today23-09-2014 13:43:54

3,0K Tweets

2,6K Followers

1,4K Following

Victor Sanh(@SanhEstPasMoi) 's Twitter Profile Photo

Can't wait to see multimodal LLama 3!

We released a resource that might come in handy: The Cauldron🍯

The Cauldron is a massive manually-curated collection of 50 vision-language sets for instruction fine-tuning. 3.6M images, 30.3M query/answer pairs.

It covers a large

Can't wait to see multimodal LLama 3! We released a resource that might come in handy: The Cauldron🍯 The Cauldron is a massive manually-curated collection of 50 vision-language sets for instruction fine-tuning. 3.6M images, 30.3M query/answer pairs. It covers a large
account_circle
Gabriel Martín Blázquez(@gabrielmbmb_) 's Twitter Profile Photo

Hugging Face PRO or Enterprise user?

Then you can combine distilabel 1.0.0 and Llama 3 recent releases to generate a synthetic text dataset using serverless endpoints 🤗

@huggingface PRO or Enterprise user? Then you can combine distilabel 1.0.0 and Llama 3 recent releases to generate a synthetic text dataset using serverless endpoints 🤗
account_circle
Alvaro Bartolome(@alvarobartt) 's Twitter Profile Photo

🦫 We have just released `Capybara-Preferences` in collaboration with KAIST AI and Hugging Face

A new synthetic preference dataset built using `distilabel` on top of LDJ Capybara dataset

More details 🧵

huggingface.co/datasets/argil…

account_circle
Alexander Doria(@Dorialexander) 's Twitter Profile Photo

Big announcement: pleias releases a massive open corpus of 2 million Youtube videos in Creative Commons (CC-By) on Hugging Face. Youtube-Commons features 30 billion words of audio transcriptions in multiple languages, and soon other modalities huggingface.co/datasets/PleIA…

Big announcement: @pleiasfr releases a massive open corpus of 2 million Youtube videos in Creative Commons (CC-By) on @huggingface. Youtube-Commons features 30 billion words of audio transcriptions in multiple languages, and soon other modalities huggingface.co/datasets/PleIA…
account_circle
Argilla(@argilla_io) 's Twitter Profile Photo

Did you know that Argilla and distilabel datasets have over 6 million hub downloads on the Hub? 🤯

Now, distilabel datasets will be even easier to identify thanks to the new icon added to the Hugging Face Hub—a nice addition to yesterday's release!

github.com/argilla-io/dis…

Did you know that Argilla and distilabel datasets have over 6 million hub downloads on the Hub? 🤯 Now, distilabel datasets will be even easier to identify thanks to the new icon added to the @huggingface Hub—a nice addition to yesterday's release! github.com/argilla-io/dis…
account_circle
Daniel van Strien(@vanstriendaniel) 's Twitter Profile Photo

We've just added a new icon to indicate datasets created using Argilla's Distilabel on the Hugging Face Hub!

Good data is vital for AI so I'm very excited to see the growing number of data tools integrating with the Hub 🚀

We've just added a new icon to indicate datasets created using @argilla_io's Distilabel on the @huggingface Hub! Good data is vital for AI so I'm very excited to see the growing number of data tools integrating with the Hub 🚀
account_circle
Argilla(@argilla_io) 's Twitter Profile Photo

💥After months of work, we're thrilled to introduce ⚗️distilabel 1.0.0!

🚀More flexible, robust, and powerful.

🙌 Let's empower the community to build the most impactful datasets for Open Source AI!

Blogpost: argilla.io/blog/introduci…
Github: github.com/argilla-io/dis…

account_circle
Daniel van Strien(@vanstriendaniel) 's Twitter Profile Photo

Could DPO-style preference data be crucial for enhancing open LLMs across different languages?

Leveraging a 7k preference dataset, Argilla, Hugging Face, and KAIST AI utilized KAIST AI's ORPO technique with the latest Mistral AI MOE model to create a very high-performing

account_circle
Victor Sanh(@SanhEstPasMoi) 's Twitter Profile Photo

New multimodal model in town: Idefics2!

💪 Strong 8B-parameters model: often on par with open 30B counterparts.
🔓Open license: Apache 2.0.
🚀 Strong improvement over Idefics1: +12 points on VQAv2, +30 points on TextVQA while having 10x fewer parameters.
📚 Better data:

account_circle
Daniel van Strien(@vanstriendaniel) 's Twitter Profile Photo

Could DPO-style preference data be crucial for enhancing open LLMs across different languages?

Leveraging a 7k preference dataset, Argilla, Hugging Face, and KAIST AI utilized KAIST AI's ORPO technique with the latest Mistral AI MOE model to create a very high-performing

account_circle
Nicolas Patry(@narsilou) 's Twitter Profile Photo

Tgi 2.0 is out!

-back to fully open source for good (apache 2.0)
- Fastest inference server in existence (110 tok/s for cohere R+, with medusa speculation)
- fp8 support
- mixtral 8x22b support ! (also the fastest medusa on the way)

And much more to come
github.com/huggingface/te…

account_circle
Omar Sanseviero(@osanseviero) 's Twitter Profile Photo

Welcome Zephyr 141B to Hugging Chat🔥

🎉A Mixtral-8x22B fine-tune
⚡️Super fast generation with TGI
🤗Fully open source (from the data to the UI)

huggingface.co/chat/models/Hu…

Welcome Zephyr 141B to Hugging Chat🔥 🎉A Mixtral-8x22B fine-tune ⚡️Super fast generation with TGI 🤗Fully open source (from the data to the UI) huggingface.co/chat/models/Hu…
account_circle
Tao Yu(@taoyds) 's Twitter Profile Photo

DS-1000 (ds1000-code-gen.github.io) code generation data format has now been simplified and hosted on Hugging Face datasets.

1⃣Simplified format: github.com/xlang-ai/DS-10…
2⃣DS-1000 Hugging Face: huggingface.co/datasets/xlang…

Credits: Yuhang Lai and Sida Wang

account_circle
Jiwoo Hong(@jiwoohong98) 's Twitter Profile Photo

🔥Mixtral-8x22B-base + ORPO🔥

7k data & 1.3 hours to build a strong human-aligned 140B chat model🦾

👉IFEval: 65%
👉BBH: 59%
👉MT-Bench: 8.17

More models will be added to the Zephyr-ORPO collection with Argilla and Hugging Face , stay-tuned😃

account_circle