Pleased to share our new paper forthcoming in @icwsm.bsky.social! We introduce a novel framework to measure value expressions in social media posts at scale, leveraging personalization to handle the inherent subjectivity of human values.
arxiv.org/abs/2511.08453
Posts by Matt Groh
So, "that sucks" could be a great response demonstrating understanding and validating emotions. We very much find that we're not seeing the translation between people's self reported feelings of empathy and what normative empathic communication looks like.
And last, I'm sharing other threads pointing to our paper so it's easier to follow the conversation emerging from this work: bsky.app/profile/emol...
Thanks to @aakriti1kumar.bsky.social for leading this research and our awesome coauthors Fai Poungpeth, @diyiyang.bsky.social and Bruce Lambert
This is a preprint, and we'd love your feedback to make this paper even better!
If you've made it this far, I highly suggest you to check out the full paper here: arxiv.org/pdf/2603.15245
You'll find more details on all the results plus if you check out the supplementary information at the end you'll find all the prompts for the AI coach and RPG scenarios
That's empathy-as-a-trait measured via two well-established protocols like the Jordan Empathy Scale and the Single Item Trait Empathy Scale
We call this the “silent empathy effect” because people report feeling others’ pain yet lack fluency in the idioms known to make others feel heard.
So, we found a nearly 1 SD improvement in empathic communication based on two rounds of practice with personalized AI coach feedback.
But, what I found most surprising is we do not see any relationship between empathic communication performance and empathy-as-a-trait
Now, how did we measure performance? We pre-registered a communication framework based on encouraging elaboration, validating emotions, demonstrating understanding, advice giving, self orientation, and dismissing emotions. See our paper from last month for deets: nature.com/articles/s42...
But empathy isn't rocket science and it's been taught by humans in many forms. Take "7 Habits of Highly Effective People" as an example, which I'm sharing a portion of here to offer context:
We find that Bruce's #howcommunicationworks videos lead to an immediate increase in performance across our empathic communication framework and the personalized AI coach leads to an even larger increase.
It may be surprising that AI can help us quickly learn hard-to-teach skills like empathy.
Or even better yet practice with an AI conversational partner and get personalized feedback and have a conversation with our AI coach: human-ai-collaboration-lab.kellogg.northwestern.edu/rpg
Here's another one of Bruce's videos:
If you're experiencing a visceral reaction to this categorization of mis-attuned responses, you may not have mastered the art of empathic communication.
I highly suggest checking out
Bruce Lambert's videos that we used in one arm of the randomized experiment
The most common mis-attuned responses are categorized as advice giving or dismissing emotions. The advice is generally good advice, but the problem here is empathic support is generally not about advice but being with people exactly where they are to help them process their emotions
The most common affective responses to personal troubles are acknowledging the difficult of the situation and expressing sympathy by saying "I am so sorry to hear that."
With sparse autoencoder based analyses, we can get extremely detailed and share the distribution of sub-categories of the four high-level categories of empathic expression: Affective, Cognitive, Motivational, and Mis-Attuned.
It turns out LLMs are fantastic at simulations like role playing games. 91% of participants rated the scenarios as a 4 or 5 out of 5 for realisticness. By collecting almost 17k messages in realistic text-based convos, we can map idioms in these empathic support contexts
First, context: How did we collect ~3k five minute convos?
We built a LLM role playing game and asked participants to lend an ear and provide empathic support to a conversational partner. We also embedded a pre-registered randomized experiment w/ comms coach interventions.
AI can help us humans better understand how we connect.
Empathy is something most people feel strongly but fail to communicate effectively. Just see how people respond to someone passed over for promotion
Insights from 3k convos btw 968 people and LLMs from our new preprint 🧵
Thanks @emollick.bsky.social for sharing our latest work on AI and empathy! It's a preprint, so we would love feedback and comments.
The line-up for the AI and Innovation series at the Ryan Institute on Complexity in the Spring is pretty epic!
Mark your calendar for talks by
@zhitzig.bsky.social , @allisonkoe.bsky.social, Sam Goldberg
, and Pat Pataranutaporn
We're excited about the upcoming Computational Psychology preconference at @spspnews.bsky.social this Thursday. See our action-packed full day agenda below! Featuring 3 keynote talk themes with related early-career speakers, data blitz session, panel discussion. Don't miss it! #SPSP
If you made it this far, I encourage you to check out the paper for the full story!
Big props to @aakriti1kumar.bsky.social who led this paper and our wonderful team of interdisciplinary collaborators Fai Poungpeth, Diyi Yang, Erina Farrell, Bruce Lambert
Stay tuned for more!
Open questions: What is the right evaluation framework for a given conversational context? And, when are LLMs less likely to be good judges? My suspicion to this last question is when there's high context between two people like two close friends, the judgment is likely to hold less water.
🔵 LLMs could power coaching advice to everyday people on how to be better active listeners and make others feel more heard
🔵 LLMs could power scalable professional development offering nuanced evaluation to customer service reps, medical students, and therapists-in-training
With evidence that LLMs can judge empathic communication in these contexts, here are some future possibilities that I can imagine:
🔵 LLMs-as-judge can create transparency and accountability into when LLMs-as-companions might be going off the rails
Beyond the fact that objective ground truth is elusive, the problem w/ classification:
(a) imbalanced classes and off-by-one errors obscure performance
(b) variations in rating scales leads to incommensurability between scales
(c) binarization offers research degrees of freedom to juke the stats
Key finding: Across the same high-level task measured via four different frameworks in four different settings, we find that appropriately prompted LLMs judge the nuances of empathic communication nearly as reliably as experts
So, why not report simple AUC or accuracy?
Example conversations show how annotations converge and align across these three groups
Crowd judgments are generally more positive and more variable, which we suspect is due to a combo of acquiescence bias and variability of their effort and experience (see more in the SI)