๐ ๐๐ถ๐ด๐ต๐น๐ถ๐ด๐ต๐๐ ๐ณ๐ฟ๐ผ๐บ ๐ฎ๐ฌ๐ฎ๐ฑ:
๐ 33 articles were published on In-Mind.โจ
๐ The most interacted article on Bluesky was โLanguage models: A new perspective on language and cognitionโ by @boevesam.bsky.social.
๐ Read the post here again:โจbsky.app/profile/in-m...
Posts by Sam Boeve
๐ ๐ช๐ต๐ฎ๐ ๐๐ฒ ๐ฝ๐๐ฏ๐น๐ถ๐๐ต๐ฒ๐ฑ
In 2025, we published 33 articles and 12 blog posts.
๐ The most interacted article on Bluesky was โLanguage models: A new perspective on language and cognitionโ by @boevesam.bsky.social
๐ Read it again here: โจbsky.app/profile/in-m...
๐๏ธ ๐ ๐๏ธ
@boevesam.bsky.social
made this interactive visualisation to get a feeling for word predictability:
๐ wordpredictabilityvisualized.vercel.app
Curious how these predictability indices were obtained? Find out in our new paper!
๐ doi.org/10.3758/s134...
#Reading #LargeLanguageModels #MECO
Want to explore word predictability yourself on a sample of each corpus used in this work, check out this app:
wordpredictabilityvisualized.vercel.app
Modelling reading times in Dutch?:
gpt2-small-dutch (huggingface.co/GroNLP/gpt2-...) or gpt2-medium-dutch-embeddings (huggingface.co/GroNLP/gpt2-...) are great options.
3. Predictability effects are also logarithmic in Dutch, corroborating effects found in English (= linear effect of surprisal):
For very unpredictable words, a decrease in predictability has a much larger slowing-down effect on reading times than the same decrease for highly predictable words.
2. Language-specific models are generally better than multilingual ones (multilingual models are shown in blue in the figure below).
Key findings ๐
1. Smaller Dutch models often predict reading times better (= inverse scaling trend) ~ in line with evidence of English models.
But, with more context (in a book reading corpus), larger models catch up.
Large language models are powerful tools for psycholinguistic research.
But, most evidence so far is limited to English.
How well do Dutch open-source language models fit reading times using their word predictability estimates?
๐จ B&B is open for business ๐จ
Not a new career move, the next Boeve & Bogaerts paper is out in Behavior Research Methods!
doi.org/10.3758/s134...
@bogaertslab.bsky.social
โจPlayback at #teap2025 (Part 1)
Thanks to the amazing speakers and the audience for the successful symposium on #languagemodels in #psycholinguistics!
Katharina Menn, @hannawoloszyn.bsky.social, @boevesam.bsky.social, Marco Marelli, Fritz Gรผnther, @benjamingagl.bsky.social
Proud PI! ๐ On #TeaP2025 two lab members presented their work:
Haoyu Zhou in the symposium #StatisticalLearning and its Role in #Language and #Reading acquisition.
@boevesam.bsky.social in the symposium From Babies to Semantics: Leveraging #LanguageModels for #Psycholinguistic Research.
Overall, our results provide a psychometric leaderboard of Dutch large language models, ideal for researchers interested in effects of predictability in Dutch.
Check out our full dataset and code here:
osf.io/wr4qf/
Finally, we found a linear link between surprisal and reading times except for the GECO corpus where a non-linear link between surprisal and reading times fitted the data best.
A challenge to the notion of an universal linear effect of surprisal.
Second, smaller Dutch models showed a better fit to reading times than the largest models, replicating the inverse scaling trend seen in English.
However, this effect varied depending on the corpus used.
First, across three eye-tracking corpora, we found that in each case, a Dutch LLMs' surprisal estimates outperformed the multilingual model (mGPT) and the N-gram model in predicting reading times.
3.
Does surprisal still show linear link with reading times when estimated with a Dutch-specific language model as opposed to a multilingual model?
2.
Do these Dutch-specific LLMs show a similar inverse scaling trend as English models?
That is, do the smaller transformer models' surprisal estimates account better for reading times than those of the very large models?
1.
What is the best computational method for estimating word predictability in Dutch?
We compare 14 Dutch large language models (LLMs), a multilingual model (mGPT) and an N-gram model in their ability of explaining reading times.
The effect of word predictability on reading times is well established for English but not so much for Dutch.
We adressed this and asked three questions: