Advertisement · 728 × 90
#
Hashtag
#HPLT
Advertisement · 728 × 90
Post image

Our experts contributed to the latest #HPLT dataset publication, which contains some very interesting results! See here: t.co/uN2zoSF251 #DataScience

3 2 0 0
Original post on sigmoid.social

You are also welcome to the "Multilingualism: from data crawling to evaluation" birds-of-a-feather (BoF) event, which is co-organized by the #HPLT project.

Join us to discuss web-scale text data collection and processing, as well as open multilingual #LLM training and evaluation. You will have […]

0 0 0 0
Original post on sigmoid.social

If you are attending the ACL 2025 conference in Vienna, come to the poster presenting the latest #HPLT v2 datasets (the paper is available here: https://arxiv.org/abs/2503.10267

You can find the HPLT folks on Wednesday, July 30, 11:00 at the in-person poster session, Level 0, Exhibit Halls X4 […]

0 0 1 0

Reference LLMs from #HPLT and #OpenEuroLLM

0 0 0 0

Happy to share the first models I contributed to as a part of #HPLT + @openeurollm.bsky.social project and @turkunlp.bsky.social group :)

2 0 0 0
Post image

📢 First release: 38 monolingual reference LLMs (2.15B params) via #HPLT + #OpenEuroLLM

⚙️Trained on 100B tokens from HPLT v2 dataset
🌍 Cover EU langs + others
⚙️ Based on LLaMA, trained on #LUMI
📈 Useful for evaluation

Downloads + more info at openeurollm.eu/blog/hplt-oe...

14 4 0 4
Original post on sigmoid.social

1. "An Expanded Massive Multilingual Dataset for High-Performance Language Technologies (HPLT)", describing a new generation of the #HPLT web-crawled corpora in 193 languages. LTG co-authors: Nikolay Arefyev, Mariia Fedorova, Andrey Kutuzov, Petter Mæhlum, Vladislav Mikhailov, Stephan Oepen […]

0 0 0 0
Post image Post image

That's a wrap for @nodalida.bsky.social ! Short, nice and intense. I presented our work on efficient MT @helsinki-nlp.bsky.social within the #HPLT project⚡️

6 0 0 0