Natalia (@nataliaelv.hf.co) Bsky

Introduction to Argilla - Hugging Face NLP Course We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Start learning here: huggingface.co/learn/nlp-co...

1 year ago 2 1 0 0

Screenshot of the Introduction to Argilla in Chapter 10 of the Hugging Face NLP course

New chapter in the Hugging Face NLP course! 🤗 🚀

We've added a new chapter about the very basics of Argilla to the Hugging Face NLP course. Learn how to set up an Argilla instance, load & annotate datasets, and export them to the Hub.

Any feedback for improvements welcome!

1 year ago 14 1 1 0

🚀 Argilla v2.6.0 is here! 🎉

Let me show you how EASY it is to export your annotated datasets from Argilla to the Hugging Face Hub. 🤩

Take a look to this quick demo 👇

💁‍♂️ More info about the release at github.com/argilla-io/a...

#AI #MachineLearning #OpenSource #DataScience #HuggingFace #Argilla

1 year ago 11 5 0 1

Discussion Forum - a Hugging Face Space by HuggingFaceFW Discover amazing ML apps made by the community

Links:
- Reach us out: https://buff.ly/4gC2f7p
- Do some annotations: https://buff.ly/4gFuguL
- Not sure how to annotate? See this video guide: https://buff.ly/4gJ9Xg9

1 year ago 0 0 0 0

I'm taking a well-deserved break to celebrate Christmas 🎄 ☃️ but the FineWeb2 annotation sprint continues!

You can still contribute some annotations or start leading a language!

1 year ago 3 0 1 1

FineWeb2 collaborative sprint: how to annotate In this video you'll learn how you can go about annotating some records in the FineWeb2 collaborative annotation sprint launched by Hugging Face and Argilla....

If you are still wondering how the FineWeb2 annotations are done, how to follow the guidelines or how Argilla works, this is your video!

I go through a few samples of the FineWeb2 dataset and classify them based on their educational content. Check it out!

1 year ago 2 0 0 0

The FineWeb2 collaborative annotation sprint is also a way of keeping many languages alive. I talk about it in this LinkedIn post: https://buff.ly/49DghmN

1 year ago 3 0 0 0

lat - Lingua latina - Latin Join and contribute to the dataset lat - Lingua latina - Latin

I've just contributed 142 examples to this dataset:

data-is-better-together-fineweb-c.hf.space/share-your-p...

1 year ago 2 0 0 0

Sure! We do have multiple leads for some languages! You don't need to be a lead to collaborate, though. You can also contribute with annotations once we launch the annotation space 🚀 If you'd still like to lead, send me a private message and I'll sign you up 🤗

1 year ago 1 0 1 0

Thanks @rasgaard.bsky.social ! Looking forward to this!

1 year ago 0 0 0 0

Thanks! 🤗 The best thing you can do is stay tuned and contribute some annotations in the Spanish split once we launch! 🚀

1 year ago 0 0 0 0

Gradio

Check if we're still looking for leads in your language: nataliaelv-language-leads-dashboard.hf.space

Sign up: forms.gle/opx2CZUEza1r...

1 year ago 1 0 1 0

Screenshot of a dashboard showing the number of languages with a lead and languages without a lead

Next week we're launching a collaborative annotation effort to build a big multilingual dataset, so you can have high-quality data in your language.

We are really close to getting leads for 100 languages! Can you help us cover the remaining 200?

1 year ago 15 4 4 0

🙌 I just wanted to share a few thoughts about the latest Argilla release, 2.5.0, as it's a pretty big one!

Argilla now has full support for webhooks, which means you can do some pretty cool stuff, like model training on the fly as annotations are created. 🤯

#MachineLearning #NLP #DataLabeling

1 year ago 5 3 1 0

Just wanted to say that I'm sorry about my previous post. I was supporting a colleague who was sharing that his work was trending without being aware that it was harmful. I deleted the previous post a bit hastily to stop incoming insults. I'm sorry and will be more careful next time.

1 year ago 1 0 0 0

This is what you get in Bluesky when your feeds are Linguistics and otters 🦦😍

1 year ago 4 0 0 0

Language Lead sign-up At Hugging Face 🤗, we're launching a big community initiative to improve LLM training for many languages. We're looking for Language Leads to help us cultivate specific languages during this initiativ...

At @huggingface.bsky.social 🤗 we're preparing a collaborative annotation effort to build an open-source multilingual dataset.

If you'd like to get high-quality open data for your language, check if yours is listed in this form and sign up!
forms.gle/DHJdtvoSNxAA...

1 year ago 31 9 4 0

We've updated the list and it should be there now! (Until we find a lead for the language of course!)

1 year ago 2 0 1 0

The list is updated and Japanese is in there!

1 year ago 2 0 2 0

Periodic reminder: a lot of what makes AI "work" is exploited people doing the tasks, just hidden behind fancy websites.

It's good that a normie outlet like 60 Minutes is reporting on this.

1 year ago 1861 1010 17 26

Models for dataset curation - a Dataset-Tools Collection We’re on a journey to advance and democratize artificial intelligence through open source and open science.

I created a collection with good models for dataset curation

- NSFW classifiers
- PII classifiers
- blazing fast embeddings by model2vec
- quality classifier
- educational value classifier
- domain classifier

Collection: huggingface.co/collections/...

1 year ago 82 7 8 1

About me Hi! 👋 I’m Natalia, a Computational Linguist from Madrid (Spain) working at Hugging Face 🤗. I’m passionate about languages and curating high-quality data for AI.

I like posting about super-high-quality data curation for AI, languages (modern and ancient!) and linguistics.

If you'd like to follow my work on other platforms, you can find more links here: buff.ly/3OiuHPH

1 year ago 6 0 0 0

Hello everyone! 👋 Since this is growing quite a bit, I thought I'd introduce myself:

I'm Natalia, a computational linguist working at @huggingface.bsky.social as part of the team building Argilla.

1 year ago 28 0 7 0

That's so cool!

1 year ago 1 0 0 0

Back to work after a week-long offsite in Martinique 🏝️ with my colleagues from @huggingface.bsky.social 🤗 !

I had time to relax, reflect, have fun and meet people who aren't just amazing at their work but also truly kind 💖

Can't wait for the next one!

1 year ago 7 1 0 0

What's your strategy to save interesting posts and not forget about their existence?

1 year ago 0 0 0 0

If you’re nerdy about language, there are lots of really interesting people in here!

go.bsky.app/UUM7Gcx

1 year ago 64 19 1 3

Annotating and Curating Datasets in Argilla Hello, I'm Natalia from the Arguello team at Hugging Face. Today, I'll guide you through annotating and curating datasets in Arguello without coding. I demonstrate using a disaster response dataset to...

Hello bsky! As a welcome post and inspired by the latest events in Valencia, I'd like to show you how I used the "Disaster Response Messages" dataset to upload a csv file into Argilla to quickly start annotating and identify pleas of help. No code needed.
www.loom.com/share/952c15...

1 year ago 6 1 0 0

Posts by Natalia