Another one of @ahalterman.bsky.social and @katakeith.bsky.social's papers that I think should be cited more by CSS researchers:
What is a protest anyway? Codebook conceptualization is still a first-order concern in LLM-era classification
arxiv.org/abs/2510.03541
Posts by Andy Halterman
Currently in FirstView: In “Codebook LLMs: Evaluating LLMs as Measurement Tools for Political Science Concepts,” @ahalterman.bsky.social and @katakeith.bsky.social show how “off-the-shelf” LLMs have limitations in faithfully following real-world codebook operationalizations.
Very short summary of this paper:
Very excited that my paper with @katakeith.bsky.social is now out in @polanalysis.bsky.social. We investigate whether LLMs actually follow the instructions/definitions provided in codebooks, propose some diagnostics, and release a new evaluation dataset.
www.cambridge.org/core/journal...
Currently in FirstView: “Synthetically generated text for supervised text analysis.” @ahalterman.bsky.social proposes using LLMs to generate synthetic training data for training smaller, traditional supervised text models.
I promise that training models on synthetic text is a better idea than it sounds. For the theoretically squeamish: think of this as model distillation (LLM --> small classifier). For the hardcore empiricists, there are a few F1-go-up plots.
New paper in Political Analysis on synthetic text data for training classifiers. Main idea: generate training examples with LLMs, then fit classifiers on synthetic (+real) text. Paper has validations and guidance.
Blog: andrewhalterman.com/post/synthet...
Paper: www.cambridge.org/core/journal...
A McSweeney's style parody. TREADSTONE RECRUITMENT OR YOUR PHD PROGRAM? Can you tell whether each quote is discussing a top-secret CIA black ops program or your department's graduate program? 1. “We'd hoped it might build into a good training platform, but quite honestly, for a strictly theoretical exercise, we thought it was far too expensive." 2. “You came to us. You volunteered. You said you'd do anything it takes." 3. "You haven't slept for a long time, have you? Have you made a decision? This can't go on, you know. You have to decide." 4. "The details? No. I mean, I was told it was voluntary. I don't know if that's true or not, but that's what I was told." 5. "Stop running from the truth. You chose to come here! You chose to stay! And no matter how much you want to forget it... eventually you're going to have to face how you chose to become [who you are]." 6. "You could have left at any time. And you knew exactly what it meant for you if you chose to stay." 7. "You're not a liar are you? Or too weak to see this through?" 8. “Look, they took vulnerable subjects, okay? You mix that with the right pharmacology and some serious behavior modification..." 9. “You made yourself into who you are."
I couldn't find a tutorial I liked to get students who know R up and running with Python, so I wrote my own! Part 1 is here: andrewhalterman.com/post/python_...
(And if you have a tutorial you like, please let me know!)
The "focus group" meme from I Think You Should Leave. The original text, "A great steering wheel that doesn't whiff out the window while I driving" is replaced with "A great steering wheel that whiffs out the window while I driving (Schelling 1966)"
The optimal amount of dad joke cringe when teaching undergrads is > 0.