Posts by Natalia
Screenshot of the Introduction to Argilla in Chapter 10 of the Hugging Face NLP course
New chapter in the Hugging Face NLP course! π€ π
We've added a new chapter about the very basics of Argilla to the Hugging Face NLP course. Learn how to set up an Argilla instance, load & annotate datasets, and export them to the Hub.Β
Any feedback for improvements welcome!
π Argilla v2.6.0 is here! π
Let me show you how EASY it is to export your annotated datasets from Argilla to the Hugging Face Hub. π€©
Take a look to this quick demo π
πββοΈ More info about the release at github.com/argilla-io/a...
#AI #MachineLearning #OpenSource #DataScience #HuggingFace #Argilla
Links:
- Reach us out: https://buff.ly/4gC2f7p
- Do some annotations: https://buff.ly/4gFuguL
- Not sure how to annotate? See this video guide: https://buff.ly/4gJ9Xg9
I'm taking a well-deserved break to celebrate Christmas π βοΈ but the FineWeb2 annotation sprint continues!
You can still contribute some annotations or start leading a language!
If you are still wondering how the FineWeb2 annotations are done, how to follow the guidelines or how Argilla works, this is your video!
I go through a few samples of the FineWeb2 dataset and classify them based on their educational content. Check it out!
The FineWeb2 collaborative annotation sprint is also a way of keeping many languages alive. I talk about it in this LinkedIn post: https://buff.ly/49DghmN
I've just contributed 142 examples to this dataset:
data-is-better-together-fineweb-c.hf.space/share-your-p...
Sure! We do have multiple leads for some languages! You don't need to be a lead to collaborate, though. You can also contribute with annotations once we launch the annotation space π If you'd still like to lead, send me a private message and I'll sign you up π€
Thanks @rasgaard.bsky.social ! Looking forward to this!
Thanks! π€ The best thing you can do is stay tuned and contribute some annotations in the Spanish split once we launch! π
Check if we're still looking for leads in your language: nataliaelv-language-leads-dashboard.hf.space
Sign up: forms.gle/opx2CZUEza1r...
Screenshot of a dashboard showing the number of languages with a lead and languages without a lead
Next week we're launching a collaborative annotation effort to build a big multilingual dataset, so you can have high-quality data in your language.
We are really close to getting leads for 100 languages! Can you help us cover the remaining 200?
π I just wanted to share a few thoughts about the latest Argilla release, 2.5.0, as it's a pretty big one!
Argilla now has full support for webhooks, which means you can do some pretty cool stuff, like model training on the fly as annotations are created. π€―
#MachineLearning #NLP #DataLabeling
Just wanted to say that I'm sorry about my previous post. I was supporting a colleague who was sharing that his work was trending without being aware that it was harmful. I deleted the previous post a bit hastily to stop incoming insults. I'm sorry and will be more careful next time.
This is what you get in Bluesky when your feeds are Linguistics and otters π¦¦π
At @huggingface.bsky.social π€ we're preparing a collaborative annotation effort to build an open-source multilingual dataset.
If you'd like to get high-quality open data for your language, check if yours is listed in this form and sign up!
forms.gle/DHJdtvoSNxAA...
We've updated the list and it should be there now! (Until we find a lead for the language of course!)
The list is updated and Japanese is in there!
Periodic reminder: a lot of what makes AI "work" is exploited people doing the tasks, just hidden behind fancy websites.
It's good that a normie outlet like 60 Minutes is reporting on this.
I created a collection with good models for dataset curation
- NSFW classifiers
- PII classifiers
- blazing fast embeddings by model2vec
- quality classifier
- educational value classifier
- domain classifier
Collection: huggingface.co/collections/...
I like posting about super-high-quality data curation for AI, languages (modern and ancient!) and linguistics.
If you'd like to follow my work on other platforms, you can find more links here: buff.ly/3OiuHPH
Hello everyone! π Since this is growing quite a bit, I thought I'd introduce myself:
I'm Natalia, a computational linguist working at @huggingface.bsky.social as part of the team building Argilla.
That's so cool!
Back to work after a week-long offsite in Martinique ποΈ with my colleagues from @huggingface.bsky.social π€ !
I had time to relax, reflect, have fun and meet people who aren't just amazing at their work but also truly kind π
Can't wait for the next one!
What's your strategy to save interesting posts and not forget about their existence?
If youβre nerdy about language, there are lots of really interesting people in here!
go.bsky.app/UUM7Gcx
Hello bsky! As a welcome post and inspired by the latest events in Valencia, I'd like to show you how I used the "Disaster Response Messages" dataset to upload a csv file into Argilla to quickly start annotating and identify pleas of help. No code needed.
www.loom.com/share/952c15...