✨The NLP+CSS workshop is returning to ACL 2026!✨
And this year, we have a new shared task with prizes!
Website/CfP: sites.google.com/site/nlpandc...
Deadlines: March 5 (direct), March 24 (pre-reviewed ARR)
#NLProc #CompSocialSci #ComputationalSocialScience #ACL2026NLP
@aclmeeting.bsky.social
Posts by Katie Keith
Very excited that my paper with @katakeith.bsky.social is now out in @polanalysis.bsky.social. We investigate whether LLMs actually follow the instructions/definitions provided in codebooks, propose some diagnostics, and release a new evaluation dataset.
www.cambridge.org/core/journal...
Whoa...!! If social-science leaning at all maybe try other preprint servers? SocArXiv for example? We put one of our preprints there: osf.io/preprints/so...
Yes! I agree. It's so rare these days to see a keynote that is so thorough and full of new conceptualizations.
5300 attendees in person here at #acl2025 😮
The #ACL2025 #ACL2025NLP feed is up and running! It matches both hashtags and any posts from or mentions of @aclmeeting.bsky.social
Pin it to your home 📌 and enjoy!
bsky.app/profile/did:...
Topic @adeldaoud.bsky.social and I were discussing today at lunch at #ic2s2 and want to ask here:
What are the “known facts” in the social sciences? Which relationships between at least two social variables have been empirically found to have large effects and replicated by multiple groups?
Under review! Happy to share a draft if you email me. Thanks!
Thanks:)
Highlighting this thread. Based on what I'm seeing at #ic2s2 this week, this line of work is hot (if a bit crowded), but I predict will only be more widely adopted by social scientists in the future.
Not as recent, but still LLM-based
"WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation." GPT-3 composes new examples with similar patterns to challenging examples.
aclanthology.org/2022.finding...
I thought this was a clever and useful paper from Xiong, ... Hovy, El-Assady, Ash "Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification." Using LLMs to help humans refine their codebooks (before codebooks are fixed for the true annotation stage) arxiv.org/pdf/2507.05010
We used active learning to create a human-annotated dataset of 1050 instances from FOMC transcripts—labeled for FOMC members’ opinions and directional stance towards monetary policy. Preprint and dataset should be released publicly by the end of the summer but email me for an advanced copy.
Yay! I'm there as well. Let's sync up.
This was top-down decision and Williams faculty have yet to formally discuss it. Unclear whether it is resistance or capitulation.
www.science.org/content/arti...
Honored by the feature on my research, grant, and GPU cluster by the Williams magazine. today.williams.edu/magazine/a-c...
Personally, I find I have to burn a day answering all the questions (particularly for a dataset release). I think it should be condensed to the 5 most important ones.
A full room for @katakeith.bsky.social's talk on proximal causal inference with text data ✨✨✨
Mark your calendars for these upcoming events tied to SCI and its One-U Responsible AI Initiative! Visit rai.utah.edu/events for details.
@parasharmanish.bsky.social @katakeith.bsky.social @anamarasovic.bsky.social @freiling.bsky.social
Our semi-synthetic experiments use MIIMIC-III clinical notes and two open-weight LLMs and show that our method produces estimates with low bias.
For settings with an unobserved (but known) confounding variable, we propose a new causal inference method that uses two instances of pre-treatment text data, infers two proxies using two zero-shot models on the separate instances, and applies these proxies in the proximal g-formula.
Check out our #NeurIPS2024 poster (presented by my collaborators Jacob Chen and Rohit Bhattacharya) about “Proximal Causal Inference With Text Data” at 5:30pm tomorrow (Weds)!
neurips.cc/virtual/2024...
We're hiring new #nlp faculty this year!
Asst or Assoc Professors in NLP at UMass CICS --
careers.umass.edu/amherst/en-u...
I'm excited to share that we've released v1.0 of our podcast corpus, SPoRC, led by my PhD student Ben Litterer! This first dataset is a slice of time, comprising over one million episodes from May and June 2020, including transcripts, diarization, and extracted audio features.
Starter packs are genius, but I was surprised there wasn't a list of them for people to find.
So I built it:
blueskydirectory.com/starter-pack...
The website monitors the packs being shared and adds the ones it finds to the database.
Missed your stater pack? Message me and I'll get it added.
New here? Interested in AI/ML? Check out these great starter packs!
AI: go.bsky.app/SipA7it
RL: go.bsky.app/3WPHcHg
Women in AI: go.bsky.app/LaGDpqg
NLP: go.bsky.app/SngwGeS
AI and news: go.bsky.app/5sFqVNS
You can also search all starter packs here: blueskydirectory.com/starter-pack...
🫠🫶
In my NLP class (www.cs.williams.edu/~kkeith/teac...) next week, we're talking about eval.
I'd like to have a large section of the lecture focus on contamination. Crowd-sourcing--please send me your favorite contamination papers! Thanks! 🙏
go.bsky.app/PCckf3C