Posts by TurkuNLP
#morningmeetings with nice people makes mornings better! Also, coffe and tea helps. What are your mornings like?
Researchers from TurkuNLP attended the workshop “Linguistic patterns of textual organization across register” at DGfS conference in Trier, Germany (Feb 25–27). "One of the best conferences I've been to!" said one! @unitrier.bsky.social 😎👍
www.dgfs2026.uni-trier.de #UniTrier #UTU
Väitöskirjatutkijamme Amanda Myntti tutkii, miten tekstien tietyt kielelliset ominaisuudet vaikuttavat kielimallien piirteisiin.
Hänen mukaansa on olennaista kiinnittää huomiota siihen, millaisella datalla kielimalli on koulutettu.
👉 Lue Myntin puheenvuoro: www.utu.fi/fi/ajankohta...
Koneoppimismallit oppivat tunnistamaan erilaisia tekstilajeja niiden kielellisten piirteiden perusteella. Tämän havaitsi FM Liina Repo, jonka väitöskirja tarkastetaan pe 30.1.
Lämpimät onnittelut väittelijälle! 🎩
#väitös #tutkimus #tiede #tekoäly #koneoppiminen #AI #tekoäly
tinyurl.com/2kn83usj
We are glad to have two visiting scholars, Karim Hemina and Florian Frenken, to share their expertise! Today we got to listen their presentations on topics "Fake news detection on social networks" and "Dynamic text structure across online registers - A geometric multivariate approach".
“This compute allocation will allow the project to continue and expand its efforts to build the next generation of fully open LLMs for all European languages.” TurkuNLP member Sampo Pyysalo, technical lead in OpenEuroLLM.
TurkuNLP member Risto Luukkonen's MSc thesis has been selected as one of three contenders for the best AI thesis of the year by AI Finland! 🎉The winner will be announced in the AI Gala next week. aifinland.fi/ai-gaala-202...
FAIR Science Café @csc.fi is an interactive online event where researchers present their work highlighting data used or produced. On Nov 21. Tomasz Galica @turkunlp.bsky.social talks about developing and evaluating LLMs, training datasets and risks. Info & sign up: www.dariah.fi/event/fair-s...
Our experts contributed to the latest #HPLT dataset publication, which contains some very interesting results! See here: t.co/uN2zoSF251 #DataScience
Suomen Akatemia valitsi uudeksi huippuyksiköksi Virpi Lummaan johtaman ihmisen monimuotoisuutta tutkivan yksikön (2026-2033), jossa ovat mukana Veronika Laippala, Päivi Onkamo ja Outi Vesakoski. Tutkimuksen huippuyksiköt kuuluvat oman tieteenalansa kansainväliseen kärkeen. www.utu.fi/fi/ajankohta...
Doctoral students from TurkuNLP together with people from DigiTS Tartu are planning a workshop on presentation skills specifically for DH researchers! We are grateful for #TurkuUniversityFoundation for the Villa Tammekann grant for hosting the upcoming workshop next autumn. ♥️ Looking forward to it!
(Nojonen, Korsu, Ginter, Laippala & Kanerva 2025) introduce TCBLex, a lexical database of Finnish literary works read by children (7-15y). Data consists of 14 sub-lexicons and over 11 million tokens, annotated and lemmatized.
Paper: link.springer.com/article/10.3...
Data: doi.org/10.5281/zeno...
Two articles by TurkuNLP members have been published in a book about the linguistic landscape of Turku, except that (Kupari & Lamberg 2025) and (Ristilä 2025) have turned the tables and observed the "landscape in language". The book is available for free online here: oa.finlit.fi/books/e/10.2...
Our Doctoral Researcher Otto Tarkka (@ottotarkka.bsky.social) visited CSC facilities in Kajaani last month on a trip organized by FIN-CLARIAH. "It was great to meet new people and hear how CSC computers are used in a wide variety of research projects."
Our Latin expert, Hanna-Mari Kupari, presented at the Norwegian Institute in Rome on "Latin Across Registers: A Computational Analysis of Situational Language Use Reflected in Grammar". See the slides and abstract here:
github.com/HannaKoo/Nor...
Teimouri, Kanerva & Ginter (2025) published insights for model interpretability in their study of a multi-attention head model, showing that heads capture distinct semantics and deeper layers enhance separation but pooling can blur patterns: acl-bg.org/proceedings/...
Maryam from TurkuNLP participated in #RANLP2025 (Recent advances in Natural Language Processing) and their team won a competition where they were to create a solution for a hate speech classifier for 5 low resource languages. 🏆Congrats!
Tapio Salakoski ja Filip Ginter lähikuvassa. Kuvan päällä lukee: Tiedelinja: Onko tekoäly suuri mahdollisuus vai kohtalokas virhe?
Mitkä ovat tutkijoidemme suurimmat toiveet ja pahimmat pelot tekoälyn suhteen?
🎧 Kuuntele Tiedelinja-podcastimme uusin jakso, jossa data-analytiikan professori Filip Ginter ja vararehtori Tapio Salakoski keskustelevat tekoälystä.
👉 Kuuntele Tiedelinja-podcastia: www.utu.fi/fi/ajankohta...
TurkuNLP was at Corpus Linguistics Conference 2025! #CL2025 Pictures of some of our participants by Hanna-Mari Kupari and Jiaqi Guo. Search the book of abstracts for "University of Turku" to read more about our contributions: drive.google.com/file/d/1TiwO... Thank you @cl2025.co.uk!
TurkuNLP leads the central work package on building LLMs within OpenEuroLLM.
openeurollm.eu/blog/LUMI-Ex...
Our recent paper on the impact of register (genre) on LLM performance. Key points: news do poor in evaluation, while opinionated texts are among the best. We hope this work can be used to understand the impact of register on LLMs and improve training data mixes! arxiv.org/abs/2504.01542
TurkuNLP is now on Bluesky! 🎉