Natalia (@natalakiou) 's Twitter Profile
Natalia

@natalakiou

Lingüista computacional. Trabajando en Argilla @HuggingFace. PhD @crewsproject @CamClassics.

ID: 163187800

linkhttps://www.youtube.com/c/Alfabetika calendar_today05-07-2010 19:37:27

1,1K Tweet

358 Followers

487 Following

William J.B. Mattingly (@wjb_mattingly) 's Twitter Profile Photo

So, here's something I didn't expect to work as well as it does. You can apparently take line-level image transcriptions, create synthetic manuscript pages and then fine-tune Qwen 2 VL on that data. I fine-tuned on 2,000 synthetic data. Preparing a larger test on 100k images now.

So, here's something I didn't expect to work as well as it does. You can apparently take line-level image transcriptions, create synthetic manuscript pages and then fine-tune Qwen 2 VL on that data. I fine-tuned on 2,000 synthetic data. Preparing a larger test on 100k images now.
Ben Burtenshaw (@ben_burtenshaw) 's Twitter Profile Photo

Last week we built a few custom data labelling interfaces with Argilla and shared them in this blog post. Check it out if you're building LLM applications for tasks likes: code review, image preferences, agent trace review, translation review, copy editing

Last week we built a few custom data labelling interfaces with <a href="/argilla_io/">Argilla</a> and shared them in this blog post. Check it out if you're building LLM applications for tasks likes: code review, image preferences, agent trace review, translation review, copy editing
Natalia (@natalakiou) 's Twitter Profile Photo

Una charla super interesante, que explica genial la situación actual y retos de la IA y las tecnologías del lenguaje: ub.edu/ubtv/video/qua…

Natalia (@natalakiou) 's Twitter Profile Photo

Great and safe AI is built with great data ✨ Check out Clémentine Fourrier 🍊 's evaluation guidebook, where Ben Burtenshaw and I have included some practical tips that you can consider to build your evaluation dataset using human annotations.

Natalia (@natalakiou) 's Twitter Profile Photo

Inspired by the latest events in Valencia, I'd like to show you how I used the "Disaster Response Messages" dataset to upload a csv file into Argilla to quickly start annotating and identify pleas of help. No code needed. loom.com/share/952c157c…

Natalia (@natalakiou) 's Twitter Profile Photo

Cada vez que tenemos una catástrofe son los trabajadores los que levantan el país, son a ellos a los que aplaudimos. Y lo hacen también el resto de los días en silencio, sin reconocimiento. Olé por todos los trabajores y trabajadoras de este país.

Natalia (@natalakiou) 's Twitter Profile Photo

Back to work after a week-long offsite in Martinique 🏝️ with my colleagues from Hugging Face 🤗 ! I had time to relax, reflect, have fun and meet people who aren't just amazing at their work but also truly kind 💖 Can't wait for the next one!

Back to work after a week-long offsite in Martinique 🏝️ with my colleagues from <a href="/huggingface/">Hugging Face</a>  🤗 ! 

I had time to relax, reflect, have fun and meet people who aren't just amazing at their work but also truly kind 💖 

Can't wait for the next one!
Natalia (@natalakiou) 's Twitter Profile Photo

You don't need to know how to code or train models to help build better AI models for your language! If you'd like to get high-quality data for your language, check if yours is listed in this form and sign up! forms.gle/DHJdtvoSNxAAtA…

Natalia (@natalakiou) 's Twitter Profile Photo

I'm still looking for Leads who can help us reach more people to annotate Latin, Ancient Greek, Esperanto and Lingua Franca Nova! I'd be very sad if we didn't get enough data for these ones 🥲 Sign up here! 👇

Natalia (@natalakiou) 's Twitter Profile Photo

If you are still wondering how FineWeb2 annotations are done, how to follow the guidelines or how Argilla works, this is your video! I go through a few samples of the FineWeb2 dataset and classify them based on their educational content. Check it out! buff.ly/3VHocdl