📢 We are very happy to have Jana Jung for our third session of the TADA Spring Speaker Series.
Jana will present recently published work on whether psychometric tests work for LLMs.
When? 15th of April, 5pm (Berlin time)
Where? Online (Sign up for the newsletter at tada.cool)
See you there!
Posts by Georg Ahnert
Are you using survey-style questionnaires designed for humans to measure characteristics of LLMs?
In our #EACL2026 paper, we evaluate both the reliability and validity of such tests and found that their scores do not reflect real-world model behavior. In fact, they can be deceptive!
🧵1/3
Happy to announce the 2nd edition of our Summer School in Computational Social Science that will take place in the beautiful Villa del Grumello on Lake Como between June 22-26, 2026!
*** DEADLINE FOR APPLICATION: February 15, 2026 (firm deadline) ***
More details here:
css2.lakecomoschool.org
👋🏼 I'm at #EMNLP2025 presenting "The Prompt Makes the Person(a): A Systematic Evaluation of Sociodemographic Persona Prompting for LLMs"
🕑 Thu. Nov 6, 12:30 - 13:30
📍 Findings Session 2, Hall C3
🚨New paper alert🚨
🤔 Ever wondered how the way you write a persona prompt affects how well an LLM simulates people?
In our #EMNLP2025 paper, we find that using interview-style persona prompts makes LLM social simulations less biased and more aligned with human opinions.
🧵1/7
Thrilled to talk about how seemingly small decisions in silicon sampling can have a large impact on simulated survey responses 👀 Join us on Oct 29th! 👈
Come join next Wednesday if you want to rant about society's love-hate relationship with LLMs!
👋 #ACL2025NLP 🇦🇹 @marlutz.bsky.social and I are presenting our poster on demographic representativeness of LLMs today!
🕦 10:30-12:00
📍 Hall X5 (board 1 or 14 according to different sources 🧐)
Here’s the paper on ACL anthology: aclanthology.org/2025.finding...
Drop by!
Here‘s some of the slides 👇 bsky.app/profile/mstr...
Chair for Data Science in the Economic and Social Sciences at University of Mannheim having lots of fun at #ic2s2 @janajung.bsky.social @wanlo.bsky.social @indiiigo.bsky.social @jrupprec.bsky.social @maximiliankreutner.bsky.social and Stefano Balietti
Laura Nelson on stage presenting her keynote at IC2S2. The slide lays out "A maturing field" of Computational Qualitative Research
Really inspiring keynote by @lauraknelson.bsky.social this morning at #IC2S2 discussing when to model and when to generate societies—among many other themes in computational qualitative research!
Before heading to ACL, I'm excited to be at #IC2S2 this week! 🌞
I'll present a related working paper on validating LLM social simulations at the ABM session on Tuesday (11 AM, Vingen 7): indiiigo.github.io/files/GABM_V...
(w/ @wanlo.bsky.social @mstrohm.bsky.social and @janalasser.bsky.social)
Screenshot of our paper "Missing the Margins: A Systematic Literature Review on the Demographic Representativeness of LLMs"
Details about what we annotated in our systematic review
Do LLMs represent the people they're supposed simulate or provide personalized assistance to?
We review the current literature in our #ACL2025 Findings paper and investigating what researchers conclude about the demographic representativeness of LLMs:
osf.io/preprints/so...
1/
LLMs can understand political discourse, but can they actually predict votes of real politicians?
Excited to share my work at #IC2S2 this week!
I will present my poster on Tuesday between 1:30 and 3:30 p.m.
The Swedish countryside as seen from a moving train, with a lake, a red and white house, and some cows.
The 15.07 train has a 30 min delay now but the landscape‘s quite pretty ;)
Really excited to also present this work at #IC2S2 next week in Norrköping! 🎉 I'd love to discuss how to produce LLM survey responses at my poster on Wed at 13:30 (Poster Session 2, Poster ID 68) 📊
LLMs can generate synthetic survey responses, e.g. for imputation, but how reliable are they? 📋
At #IC2S2, I'll be sharing our research on the robustness of AI-generated responses to perturbations and if they mirror human survey biases. 🤖
Come by my poster on Tuesday between 1:30 and 3:30 p.m.
Very excited to head to #IC2S2 next week! 🎉
In our project, we tested whether a psychological assessment can measure sexism in LLMs, and found that applying such tools to LLMs is not as straightforward as it seems.
Find me and my poster at Poster Session 1 (Tue 12:30-14:30) — hope to see you there
A research setup for the evaluation of Answer Production Methods for closed-ended survey responses from LLMs. An LLM is prompted with a survey and an optional instruction, before a Answer Production Method is applied. These methods range from token-probabilities to open-ended text generation + classification. I then evaluated them against human survey answers and calculate individual-level accuracy as well as distribution alignment for sub-populations.
LLMs are trained to produce open-ended responses 📝, but most survey items require closed-ended responses instead 📊
This Wed 11:00–12:30 at #ESRA25, I'll discuss the large impact that Answer Production Methods have on prediction results + share recommendations for methods and parameters. 👈
Thanks :) We have a BERT-based baseline model that labels individual tweets—but I agree, would be a very interesting comparison now that LLMs can increasingly handle super long contexts!
Thanks a lot for the shoutout! Would be happy to talk about this and other ongoing projects on social simulation at #ICWSM next week 🙂
Our conclusion: Temporal Adapters enable longitudinal analyses of affect aggregates from social media data by temporally aligning LLMs. ⏱️
Read the full paper: ojs.aaai.org/index.php/IC...
Our estimates with Llama 3 Temporal Adapters show a strong positive and significant correlation with collective frustration, fear, boredom, and sadness. Our results vary strongly between emotions, but they are in line with a baseline method's estimates.
We also apply our method to the extraction of public attitudes towards Boris Johnson as a prime minister and towards the National Healthy Service, were we similarly find positive cross-correlation with survey data for some but not all answer options.
Results: From several collective emotions and public opinion, our longitudinal estimates show a strong positive and significant cross-correlation with survey data gathered by YouGov directly from human participants.
Overview of our method that shows how each week's Twitter data is used to train a separate Temporal Adapter, and how a weekly affect aggregate is then obtained from the LLM's token probabilities.
Method: We gather weekly text data from a panel of Twitter users and fine-tune Temporal Adapters for Llama 3 8B with it. 🦙 We then prompt Llama with established survey questions, one week at a time, to extract longitudinal affect aggregates.
A lineplot that shows how scared people in the UK were over time, during the first COVID-19 lockdown. Our method (Llama 3 Temporal Adapters) produces similar estimates of as the survey data gathered by YouGov.
Excited to present our paper with @maxpe.bsky.social, @dgarcia.eu, and @mstrohm.bsky.social next week at #ICWSM! ✨
We extend social simulation with LLMs to a longitudinal setting by fine-tuning Temporal Adapters—here's how: 🧵
We're excited to announce #DataFest Germany 2025 at LMU Munich, March 28-30! In this #hackathon, students from diverse study programs compete for the best insights and visualizations from an exclusive dataset within 48 hours. More info: www.datafest.de/home
Great to see such strong arguments for using "open-weight" LLMs! Maybe setting random seeds could be added to the advice to practitioners? Most interfaces seem to support this now—huggingface, OpenAI, Ollama, vllm,…
Ready for another Computational Social Science Starter Pack?
Here is number 2! More amazing folks to follow! Many students and the next gen represented!
go.bsky.app/GoEyD7d