Thereโs plenty of evidence for political bias in LLMs, but very few evals reflect realistic LLM use cases โ which is where bias actually matters.
IssueBench, our attempt to fix this, is accepted at TACL, and I will be at #EMNLP2025 next week to talk about it!
New results ๐งต
Posts by Musashi Hinck
apparently youโre supposed to boil the water before you fill the bottom bit, because you want to avoid scalding your data as much as possible
New job ad: Assistant Professor of Quantitative Social Science, Dartmouth College apply.interfolio.com/172357
Please share with your networks. I am the search chair and happy to answer questions!
Exciting work coming from @pranavgoel.bsky.social looking at the effect of ChatGPT and similar tools on web browsing habits.
When people use these tools do they tend to stay on the platform instead of being referred elsewhere? Could this lead to the end of the open web? #pacss2025 #polnet2025
๐ฏ, when talking to AI doomers in 2023 I thought they had a naive view of how this technology will be integrated, but now itโs looking like I am the naive one (still deeply skeptical of how many of their scenarios play out though)
๐ขNew POSITION PAPER: Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts
Despite recent results, SAEs aren't dead! They can still be useful to mech interp, and also much more broadly: across FAccT, computational social science, and ML4H. ๐งต
Grateful to win Best Paper at ACL for our work on Fairness through Difference Awareness with my amazing collaborators!! Check out the paper for why we think fairness has both gone too far, and at the same time, not far enough aclanthology.org/2025.acl-lon...
New working paper: โSurvey Estimates of Wartime Mortality,โ with Gary King, available at gking.harvard.edu/sibs. We provide the first formal proofs of the statistical properties of existing mortality estimators, along with empirical illustrations, to develop intuitions that guide best practices.
Love this! Especially the explicit operationalization of what โbiasโ they are measuring via specifying the relevant counterfactual.
Definitely an approach that more papers talking about effects can incorporate to better clarify what the phenomenon they are studying.
On second thought definitely two!
Iโd do 1 or 2. Definitely get an egg custard (tart) as a snack too :) Enjoy!
New paper with Rebecca Johnson (@rebeccaj.bsky.social) on parental perceptions of using algorithms to allocate scarce resources in schools, now out in Sociological Science (@sociologicalsci.bsky.social):
Thrilled to share that this is out in @pnas.org today! ๐
We show that linguistic generalization in language models can be due to underlying analogical mechanisms.
Shoutout to my amazing co-authors @weissweiler.bsky.social, @davidrmortensen.bsky.social, Hinrich Schรผtze, and Janet Pierrehumbert!
๐๐จ๐ฐ ๐๐๐ง ๐ฐ๐ ๐ฉ๐๐ซ๐๐๐๐ญ๐ฅ๐ฒ ๐๐ซ๐๐ฌ๐ ๐๐จ๐ง๐๐๐ฉ๐ญ๐ฌ ๐๐ซ๐จ๐ฆ ๐๐๐๐ฌ?
Our method, Perfect Erasure Functions (PEF), erases concepts perfectly from LLM representations. We analytically derive PEF w/o parameter estimation. PEFs achieve pareto optimal erasure-utility tradeoff backed w/ theoretical guarantees. #AISTATS2025 ๐งต
How does the public conceptualize AI? Rather than self-reported measures, we use metaphors to understand the nuance and complexity of peopleโs mental models. In our #FAccT2025 paper, we analyzed 12,000 metaphors collected over 12 months to track shifts in public perceptions.
๐ก Ever wondered how social media and digital technology shapes our democracy?
Join our team @CSMaP_NYU as a Research Engingeer and help us build the tools that power cutting-edge research on the digital public sphere.
๐ Apply now!
apply.interfolio.com/165833
It is critical for scientific integrity that we trust our measure of progress.
The @lmarena.bsky.social has become the go-to evaluation for AI progress.
Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.
๐
The mods of r/ChangeMyView shared the sub was the subject of a study to test the persuasiveness of LLMs & that they didn't consent. Thereโs a lot that went wrong, so hereโs a ๐งต unpacking it, along with some ideas for how to do research with online communities ethically. tinyurl.com/59tpt988
Excited to be presenting "LLMs in Qualitative Research: Uses, Tensions, and Intentions" with @mariannealq.bsky.social at #CHI2025 today!
๐ paper: dl.acm.org/doi/10.1145/...
On point 1, you can account for this bias with tools like Design-based Supervised Learning (naokiegami.com/dsl/)!
This framework uses a small number of randomly sampled gold standard labels to correct bias in downstream estimates based on error-prone proxies like LLM annotations
Logo for MIB: A Mechanistic Interpretability Benchmark
Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work?
We propose ๐ ๐ ๐๐: a ๐ echanistic ๐nterpretability ๐enchmark!
A deepseek whale about to overthink until the Terminator tells it to answer right away.
Check out our new paper on benchmarking and mitigating overthinking in reasoning models!
From a simple observational measure of overthinking, we introduce Thought Terminator, a black-box, training-free decoding technique where RMs set their own deadlines and follow them
arxiv.org/abs/2504.13367
ModernBERT or DeBERTaV3?
What's driving performance: architecture or data?
To find out we pretrained ModernBERT on the same dataset as CamemBERTaV2 (a DeBERTaV3 model) to isolate architecture effects.
Here are our findings:
And then our EMNLP paper last year finds that prompting LLaVA-style VLMs causes a loss in fidelity: aclanthology.org/2024.finding...
@carolin-holtermann.bsky.social, @paul-rottger.bsky.social and @a-lauscher.bsky.social develop a benchmark for this problem in arxiv.org/abs/2403.03814
Llama 4 system prompt. Highlighted text: "Respond in the language the user speaks to you in, unless they ask otherwise."
Language Fidelity--having an LLM reply in the same language as the user's query--has made its way into the #Llama4 system prompt!
Some interesting work from co-authors and myself on this problem (short thread):
- arxiv.org/abs/2403.03814
- aclanthology.org/2024.finding...
Check out the paper at:
๐Paper: arxiv.org/abs/2504.07072
๐ฟData: hf.co/datasets/Coh...
๐Website: cohere.com/research/kal...
Huge thanks to everyone involved! This was a big collaboration ๐
[New preprint!] Do Chinese AI Models Speak Chinese Languages? Not really. Chinese LLMs like DeepSeek are better at French than Cantonese. Joint work with
Unso Jo and @dmimno.bsky.social . Link to paper: arxiv.org/pdf/2504.00289
๐งต
Is the needle-in-a-haystack test still meaningful given the giant green heatmaps in modern LLM papers?
We create ONERULER ๐, a multilingual long-context benchmark that allows for nonexistent needles. Turns out NIAH isn't so easy after all!
Our analysis across 26 languages ๐งต๐