Natalia (@natalakiou) Twitter Tweets • TwiCopy

Natalia

@natalakiou

+ Follow

Lingüista computacional. Trabajando en Argilla @HuggingFace. PhD @crewsproject @CamClassics.

ID: 163187800

linkhttps://www.youtube.com/c/Alfabetika calendar_today05-07-2010 19:37:27

1,1K Tweet

358 Followers

487 Following

William J.B. Mattingly

@wjb_mattingly

a year ago

So, here's something I didn't expect to work as well as it does. You can apparently take line-level image transcriptions, create synthetic manuscript pages and then fine-tune Qwen 2 VL on that data. I fine-tuned on 2,000 synthetic data. Preparing a larger test on 100k images now.

thumb_up_off_alt35

chat_bubble_outline3

repeat6

shareShare

Ben Burtenshaw

@ben_burtenshaw

a year ago

Last week we built a few custom data labelling interfaces with Argilla and shared them in this blog post. Check it out if you're building LLM applications for tasks likes: code review, image preferences, agent trace review, translation review, copy editing

Last week we built a few custom data labelling interfaces with <a href="/argilla_io/">Argilla</a> and shared them in this blog post. Check it out if you're building LLM applications for tasks likes: code review, image preferences, agent trace review, translation review, copy editing

thumb_up_off_alt15

chat_bubble_outline1

repeat6

shareShare

Natalia

@natalakiou

a year ago

Una charla super interesante, que explica genial la situación actual y retos de la IA y las tecnologías del lenguaje: ub.edu/ubtv/video/qua…

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Natalia

@natalakiou

a year ago

Great and safe AI is built with great data ✨ Check out Clémentine Fourrier 🍊 's evaluation guidebook, where Ben Burtenshaw and I have included some practical tips that you can consider to build your evaluation dataset using human annotations.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Natalia

@natalakiou

a year ago

Inspired by the latest events in Valencia, I'd like to show you how I used the "Disaster Response Messages" dataset to upload a csv file into Argilla to quickly start annotating and identify pleas of help. No code needed. loom.com/share/952c157c…

thumb_up_off_alt5

chat_bubble_outline0

repeat4

shareShare

Natalia

@natalakiou

a year ago

Lost in translation, el corto:

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Natalia

@natalakiou

a year ago

Cada vez que tenemos una catástrofe son los trabajadores los que levantan el país, son a ellos a los que aplaudimos. Y lo hacen también el resto de los días en silencio, sin reconocimiento. Olé por todos los trabajores y trabajadoras de este país.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Natalia

@natalakiou

a year ago

I also made the move:

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Natalia

@natalakiou

a year ago

Back to work after a week-long offsite in Martinique 🏝️ with my colleagues from Hugging Face 🤗 ! I had time to relax, reflect, have fun and meet people who aren't just amazing at their work but also truly kind 💖 Can't wait for the next one!

Back to work after a week-long offsite in Martinique 🏝️ with my colleagues from <a href="/huggingface/">Hugging Face</a> 🤗 !

I had time to relax, reflect, have fun and meet people who aren't just amazing at their work but also truly kind 💖

Can't wait for the next one!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Natalia

@natalakiou

a year ago

You don't need to know how to code or train models to help build better AI models for your language! If you'd like to get high-quality data for your language, check if yours is listed in this form and sign up! forms.gle/DHJdtvoSNxAAtA…

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Natalia

@natalakiou

a year ago

I'm still looking for Leads who can help us reach more people to annotate Latin, Ancient Greek, Esperanto and Lingua Franca Nova! I'd be very sad if we didn't get enough data for these ones 🥲 Sign up here! 👇

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

Natalia

@natalakiou

a year ago

I've just contributed 142 examples to this dataset: …is-better-together-fineweb-c.hf.space/share-your-pro…

thumb_up_off_alt5

chat_bubble_outline1

repeat1

shareShare

Natalia

@natalakiou

a year ago

If you are still wondering how FineWeb2 annotations are done, how to follow the guidelines or how Argilla works, this is your video! I go through a few samples of the FineWeb2 dataset and classify them based on their educational content. Check it out! buff.ly/3VHocdl

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare