Matt Groh (@mattgroh) Bsky

Pleased to share our new paper forthcoming in @icwsm.bsky.social! We introduce a novel framework to measure value expressions in social media posts at scale, leveraging personalization to handle the inherent subjectivity of human values.
arxiv.org/abs/2511.08453

3 weeks ago 30 9 2 0

So, "that sucks" could be a great response demonstrating understanding and validating emotions. We very much find that we're not seeing the translation between people's self reported feelings of empathy and what normative empathic communication looks like.

3 weeks ago 2 0 0 0

And last, I'm sharing other threads pointing to our paper so it's easier to follow the conversation emerging from this work: bsky.app/profile/emol...

3 weeks ago 0 0 0 0

Thanks to @aakriti1kumar.bsky.social for leading this research and our awesome coauthors Fai Poungpeth, @diyiyang.bsky.social and Bruce Lambert

This is a preprint, and we'd love your feedback to make this paper even better!

3 weeks ago 0 0 1 0

If you've made it this far, I highly suggest you to check out the full paper here: arxiv.org/pdf/2603.15245

You'll find more details on all the results plus if you check out the supplementary information at the end you'll find all the prompts for the AI coach and RPG scenarios

3 weeks ago 0 0 1 0

That's empathy-as-a-trait measured via two well-established protocols like the Jordan Empathy Scale and the Single Item Trait Empathy Scale

We call this the “silent empathy effect” because people report feeling others’ pain yet lack fluency in the idioms known to make others feel heard.

3 weeks ago 0 0 1 0

So, we found a nearly 1 SD improvement in empathic communication based on two rounds of practice with personalized AI coach feedback.

But, what I found most surprising is we do not see any relationship between empathic communication performance and empathy-as-a-trait

3 weeks ago 0 0 1 0

When large language models are reliable for judging empathic communication - Nature Machine Intelligence Kumar et al. show that large language models (LLMs) nearly match expert reliability and outperform laypeople when assessing empathic communication across multiple frameworks. The performance of both L...

Now, how did we measure performance? We pre-registered a communication framework based on encouraging elaboration, validating emotions, demonstrating understanding, advice giving, self orientation, and dismissing emotions. See our paper from last month for deets: nature.com/articles/s42...

3 weeks ago 0 0 1 0

But empathy isn't rocket science and it's been taught by humans in many forms. Take "7 Habits of Highly Effective People" as an example, which I'm sharing a portion of here to offer context:

3 weeks ago 0 0 1 0

We find that Bruce's #howcommunicationworks videos lead to an immediate increase in performance across our empathic communication framework and the personalized AI coach leads to an even larger increase.

It may be surprising that AI can help us quickly learn hard-to-teach skills like empathy.

3 weeks ago 0 0 1 0

Lend an Ear Experiment Improve your empathic communication by role playing with LLMs

Or even better yet practice with an AI conversational partner and get personalized feedback and have a conversation with our AI coach: human-ai-collaboration-lab.kellogg.northwestern.edu/rpg

3 weeks ago 0 0 1 0

Here's another one of Bruce's videos:

3 weeks ago 0 0 1 0

If you're experiencing a visceral reaction to this categorization of mis-attuned responses, you may not have mastered the art of empathic communication.

I highly suggest checking out
Bruce Lambert's videos that we used in one arm of the randomized experiment

3 weeks ago 0 0 1 0

The most common mis-attuned responses are categorized as advice giving or dismissing emotions. The advice is generally good advice, but the problem here is empathic support is generally not about advice but being with people exactly where they are to help them process their emotions

3 weeks ago 0 0 1 0

The most common affective responses to personal troubles are acknowledging the difficult of the situation and expressing sympathy by saying "I am so sorry to hear that."

3 weeks ago 0 0 1 0

With sparse autoencoder based analyses, we can get extremely detailed and share the distribution of sub-categories of the four high-level categories of empathic expression: Affective, Cognitive, Motivational, and Mis-Attuned.

3 weeks ago 0 0 1 0

It turns out LLMs are fantastic at simulations like role playing games. 91% of participants rated the scenarios as a 4 or 5 out of 5 for realisticness. By collecting almost 17k messages in realistic text-based convos, we can map idioms in these empathic support contexts

3 weeks ago 0 0 1 0

First, context: How did we collect ~3k five minute convos?

We built a LLM role playing game and asked participants to lend an ear and provide empathic support to a conversational partner. We also embedded a pre-registered randomized experiment w/ comms coach interventions.

3 weeks ago 0 0 1 0

AI can help us humans better understand how we connect.

Empathy is something most people feel strongly but fail to communicate effectively. Just see how people respond to someone passed over for promotion

Insights from 3k convos btw 968 people and LLMs from our new preprint 🧵

3 weeks ago 4 0 2 0

Thanks @emollick.bsky.social for sharing our latest work on AI and empathy! It's a preprint, so we would love feedback and comments.

4 weeks ago 5 1 0 0

The line-up for the AI and Innovation series at the Ryan Institute on Complexity in the Spring is pretty epic!

Mark your calendar for talks by
@zhitzig.bsky.social , @allisonkoe.bsky.social, Sam Goldberg
, and Pat Pataranutaporn

1 month ago 0 0 0 0

We're excited about the upcoming Computational Psychology preconference at @spspnews.bsky.social this Thursday. See our action-packed full day agenda below! Featuring 3 keynote talk themes with related early-career speakers, data blitz session, panel discussion. Don't miss it! #SPSP

1 month ago 21 9 1 1

When large language models are reliable for judging empathic communication - Nature Machine Intelligence Kumar et al. show that large language models (LLMs) nearly match expert reliability and outperform laypeople when assessing empathic communication across multiple frameworks. The performance of both L...

Yes: www.nature.com/articles/s42...

2 months ago 2 0 0 0

If you made it this far, I encourage you to check out the paper for the full story!

Big props to @aakriti1kumar.bsky.social who led this paper and our wonderful team of interdisciplinary collaborators Fai Poungpeth, Diyi Yang, Erina Farrell, Bruce Lambert

Stay tuned for more!

2 months ago 2 0 0 0

Open questions: What is the right evaluation framework for a given conversational context? And, when are LLMs less likely to be good judges? My suspicion to this last question is when there's high context between two people like two close friends, the judgment is likely to hold less water.

2 months ago 2 0 1 0

🔵 LLMs could power coaching advice to everyday people on how to be better active listeners and make others feel more heard

🔵 LLMs could power scalable professional development offering nuanced evaluation to customer service reps, medical students, and therapists-in-training

2 months ago 3 0 1 0

With evidence that LLMs can judge empathic communication in these contexts, here are some future possibilities that I can imagine:

🔵 LLMs-as-judge can create transparency and accountability into when LLMs-as-companions might be going off the rails

2 months ago 2 0 1 0

Beyond the fact that objective ground truth is elusive, the problem w/ classification:

(a) imbalanced classes and off-by-one errors obscure performance
(b) variations in rating scales leads to incommensurability between scales
(c) binarization offers research degrees of freedom to juke the stats

2 months ago 2 0 1 0

Key finding: Across the same high-level task measured via four different frameworks in four different settings, we find that appropriately prompted LLMs judge the nuances of empathic communication nearly as reliably as experts

So, why not report simple AUC or accuracy?

2 months ago 2 0 1 0

Example conversations show how annotations converge and align across these three groups

Crowd judgments are generally more positive and more variable, which we suspect is due to a combo of acquiescence bias and variability of their effort and experience (see more in the SI)

2 months ago 2 0 1 0

Posts by Matt Groh