Advertisement · 728 × 90

Posts by Joel Mire

Preview
We the Platform | Columbia University Press Web 2.0 gave us the online world as we know it today. Popularized in 2004, it redefined the internet as social, a “platform” for self-expression and data... | CUP

We the Platform is available for preorder with the discount code CUP20 if you order directly from the press! cup.columbia.edu/book/we-the-... A thread on the argument below:

2 weeks ago 61 28 3 5
Screenshot of an academic paper titled "The Algorithmic Gaze of Image Quality Assessment: An Audit and Trace Ethnography of the LAION-Aesthetics Predictor" authored by Jordan Taylor, William Agnew, Maarten Sap, Sarah E. Fox, and Haiyi Zhu

Screenshot of an academic paper titled "The Algorithmic Gaze of Image Quality Assessment: An Audit and Trace Ethnography of the LAION-Aesthetics Predictor" authored by Jordan Taylor, William Agnew, Maarten Sap, Sarah E. Fox, and Haiyi Zhu

🎨💻 What is a “high-quality” or “aesthetic” image according to generative AI developers?

Happy to share that our investigation of the LAION-Aesthetics Predictor has been accepted at #FAccT2026! 🧵 (1/5)

Take a look at a preprint here: arxiv.org/abs/2601.09896

1 month ago 24 5 1 0

Cyberpunk!

2 months ago 1 0 0 0
Post image

🎭 How do LLMs (mis)represent culture?
🧮 How often?
🧠 Misrepresentations = missing knowledge? spoiler: NO!

At #CHI2026 we are bringing ✨TALES✨ a participatory evaluation of cultural (mis)reps & knowledge in multilingual LLM-stories for India

📜 arxiv.org/abs/2511.21322

1/10

2 months ago 46 22 1 2
Preview
CMU LTI Summer 2026 Internship Program Application We are looking for applicants for the Carnegie Mellon University Language Technology Institute's Summer 2026 "Language Technology for All" internship program. The main goal of this internship is to pr...

🚀 Apply to CMU LTI’s Summer 2026 “Language Technology for All” internship! 🎓 Open to pre‑doctoral students new to language tech (non‑CS backgrounds welcome). 🔬 12–14 weeks in‑person in Pittsburgh — travel + stipend paid. 💸 Deadline: Feb 20, 11:59pm ET. Apply → forms.gle/cUu8g6wb27Hs...

2 months ago 14 12 2 0
Post image Post image Post image

New paper to appear at EACL 2026 main conference, and it's now up on arxiv: arxiv.org/pdf/2505.17536. The character limit here is insane, so I'll let the screenshots speak for themselves. We put together a new dataset for conversational role attribution & thread disentanglement.

2 months ago 8 3 1 0

This was joint work with my wonderful collaborators @mariaa.bsky.social, @stevewilson.bsky.social, @zxma.bsky.social, Achyut Ganti, @andrewpiper.bsky.social, and advisor @maartensap.bsky.social, at CU Boulder, UM-Flint, @uconn.bsky.social, and @ltiatcmu.bsky.social!

4 months ago 3 0 0 0
Advertisement
Preview
Social Story Frames: Contextual Reasoning about Narrative Intent and Reception Reading stories evokes rich interpretive, affective, and evaluative responses, such as inferences about narrative intent or judgments about characters. Yet, computational models of reader response are...

Paper: arxiv.org/abs/2512.15925
Code: github.com/joel-mire/so...
SSF-Corpus: huggingface.co/datasets/joe...
SSF-Generator: huggingface.co/joelmire/lla...
SSF-Classifier: huggingface.co/joelmire/lla...

4 months ago 1 0 1 0

SocialStoryFrames helps illuminate the social topography of narrative practices and reader response in online communities, opening avenues for future research in computational social science and cultural analytics!

4 months ago 0 0 1 0
Scatterplot with x-axis for 'reader-centric normalized entorpy' and y-axis for 'author-centric normalized entropy'. Each point represents a subreddit and is color-coded according to a legend of topics (e.g., fashion, hobby).

Scatterplot with x-axis for 'reader-centric normalized entorpy' and y-axis for 'author-centric normalized entropy'. Each point represents a subreddit and is color-coded according to a legend of topics (e.g., fashion, hobby).

Lastly, we quantify the diversity (entropy) of author goals and reader reactions using SSF-Taxonomy.

Quadrants suggest distinct patterns: r/Frugal & r/techsupport have predictable scripts for authors and readers; r/Fitness has varied authorial approaches, predictable reader reactions.

4 months ago 2 0 1 0
Scatterplot comparing subreddit-pair similarity according to two similarity measures: semantic similarity and ssf-sim. The plot shows clusters of labels at regions of high and low agreement between the two measures. For example, both measures score the CFB-nfl highly, while only the ssf-sim measure scores apple-books highly.

Scatterplot comparing subreddit-pair similarity according to two similarity measures: semantic similarity and ssf-sim. The plot shows clusters of labels at regions of high and low agreement between the two measures. For example, both measures score the CFB-nfl highly, while only the ssf-sim measure scores apple-books highly.

SocialStoryFrames provides a new way—beyond semantic similarity—to compare social functions of storytelling across communities.

Even topically different communities (e.g., r/MakeupAddiction vs. r/buildapc) can show surprisingly similar social storytelling dynamics.

4 months ago 3 0 1 0
Bar plots for the overall_goal and narrative_intent dimensions of SSF-Taxonomy. Each bar plot shows relative proportion of dimensions sub-labels. For example, 'provide_info_support' has the highest proportion of overall_goal labels, followed by provide_experiential_accounts, and persuade_debate.

Bar plots for the overall_goal and narrative_intent dimensions of SSF-Taxonomy. Each bar plot shows relative proportion of dimensions sub-labels. For example, 'provide_info_support' has the highest proportion of overall_goal labels, followed by provide_experiential_accounts, and persuade_debate.

Using the framework, we analyze narrative intents across Reddit. Most common: justify/challenge belief (40%), clarify (14%), vent (14%), show identity (10%).

Association tests cast particular forms of storytelling (e.g., conveying similar experience) as mechanisms for empathy.

4 months ago 1 0 1 0
Table reporting inference classification performance for GPT-4.1 (k-shot) and SSF-Classifier across all SSF-Taxonomy dimensions.

Table reporting inference classification performance for GPT-4.1 (k-shot) and SSF-Classifier across all SSF-Taxonomy dimensions.

For inference classification, we build a zero-shot SSF-Classifier using the same distillation approach, guided by a k-shot GPT-4.1 teacher.

Expert human evaluation shows strong performance, with an average Micro F1 of 0.85 and Macro F1 of 0.79 across SSF-Taxonomy dimensions.

4 months ago 0 0 1 0
Stacked bar plot comparing human plausibility ratings for SSF-Classifier and GPT-4o across all SSF-Taxonomy dimensions.

Stacked bar plot comparing human plausibility ratings for SSF-Classifier and GPT-4o across all SSF-Taxonomy dimensions.

For inference generation, we create SSF-Generator via SFT distillation of GPT-4o on a Llama3.1 base.

We validate inference plausibility through a human survey (N=382).

94% of inferences were deemed plausible, with 78% deemed very/somewhat likely.

4 months ago 0 0 1 0
Advertisement

We define 2 tasks:
1. Generate plausible reader-response inferences given a story and its community/conversational context
2. Classify inferences onto taxonomy subdimensions.

We also curate SSF-Corpus: 6,140 storytelling contexts from 50+ Reddit communities.

4 months ago 0 0 1 0
Diagram of SSF-Taxonomy. Shows information flow from the author and community + conversational context into readers, who then have many different forms of reader response. These range from author-oriented inferences (overall goal, narrative intent, author emotional response), causal inferences (causal explanation, prediction), value judgments (character appraisal, moral, stance), and feelings (narrative feeling, aesthetic feeling).

Diagram of SSF-Taxonomy. Shows information flow from the author and community + conversational context into readers, who then have many different forms of reader response. These range from author-oriented inferences (overall goal, narrative intent, author emotional response), causal inferences (causal explanation, prediction), value judgments (character appraisal, moral, stance), and feelings (narrative feeling, aesthetic feeling).

We introduce SocialStoryFrames, a framework for contextual reasoning about narrative intent and reader response for social media stories.

Its foundation is SSF-Taxonomy, a taxonomy of dimensions of reader response, grounded in narrative theory, pragmatics, and psychology.

4 months ago 1 0 1 0
Screenshot of paper title and authors. 

Title: Social Story Frames: Contextual Reasoning about Narrative Intent and Reception
Authors: Joel Mire, Maria Antoniak, Steven R. Wilson, Zexin Ma, Achyutarama R. Ganti, Andrew Piper, Maarten Sap

Screenshot of paper title and authors. Title: Social Story Frames: Contextual Reasoning about Narrative Intent and Reception Authors: Joel Mire, Maria Antoniak, Steven R. Wilson, Zexin Ma, Achyutarama R. Ganti, Andrew Piper, Maarten Sap

Reading social media stories evokes a wide range of contextual reader reactions—inferential, affective, evaluative—yet we lack methods to study these at scale.

Excited to share our new paper that builds a framework for analyzing storytelling practices across online communities!

4 months ago 22 7 1 1
A staircase in the new School of Computer, Data & Information Sciences building at Wisconsin Madison. Tan wood structures surround tapestry art and a small indoor garden.

A staircase in the new School of Computer, Data & Information Sciences building at Wisconsin Madison. Tan wood structures surround tapestry art and a small indoor garden.

A view from above of the staircases in the Wisconsin CDIS building

A view from above of the staircases in the Wisconsin CDIS building

An shot from below of winding wooden staircases and a glass atrium rooftop. The new School of Computer, Data & Information Sciences building at Wisconsin Madison.

An shot from below of winding wooden staircases and a glass atrium rooftop. The new School of Computer, Data & Information Sciences building at Wisconsin Madison.

A bicolor white cat with seal-colored markings, looking upwards with big wide dark eyes.

A bicolor white cat with seal-colored markings, looking upwards with big wide dark eyes.

It's the season for PhD apps!! 🥧 🦃 ☃️ ❄️

Apply to Wisconsin CS to research
- Societal impact of AI
- NLP ←→ CSS and cultural analytics
- Computational sociolinguistics
- Human-AI interaction
- Culturally competent and inclusive NLP
with me!

lucy3.github.io/prospective-...

5 months ago 51 16 1 1
Performance of a sweep of models on Oolong-synth and Oolong-real. Performance decreases with increasing context length, sometimes steeply.

Performance of a sweep of models on Oolong-synth and Oolong-real. Performance decreases with increasing context length, sometimes steeply.

Can LLMs accurately aggregate information over long, information-dense texts? Not yet…

We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!

5 months ago 50 20 3 3
Post image

I'm recruiting multiple PhD students for Fall 2026 in Computer Science at @hopkinsengineer.bsky.social 🍂

Apply to work on AI for social sciences/human behavior, social NLP, and LLMs for real-world applied domains you're passionate about!

Learn more at kristinagligoric.com & help spread the word!

5 months ago 30 17 0 1
Post image

How and when should LLM guardrails be deployed to balance safety and user experience?

Our #EMNLP2025 paper reveals that crafting thoughtful refusals rather than detecting intent is the key to human-centered AI safety.

📄 arxiv.org/abs/2506.00195
🧵[1/9]

6 months ago 9 3 1 0
Post image Post image

10 years after the initial idea, Artificial Humanities is here! Thanks so much to all who have preordered it. I hope you enjoy reading it and find this research approach as generative as I do. More to come!

6 months ago 38 13 5 3
Academic paper titled un-straightening generative ai: how queer artists surface and challenge the normativity of generative ai models

The piece is written by Jordan Taylor, Joel Mire, Franchesca Spektor, Alicia DeVrio, Maarten Sap, Haiyi Zhu, and Sarah Fox.

As an image titled 24 attempts at intimacy showing 24 ai generated images with the word intimacy, none of which seems to include same gender couples

Academic paper titled un-straightening generative ai: how queer artists surface and challenge the normativity of generative ai models The piece is written by Jordan Taylor, Joel Mire, Franchesca Spektor, Alicia DeVrio, Maarten Sap, Haiyi Zhu, and Sarah Fox. As an image titled 24 attempts at intimacy showing 24 ai generated images with the word intimacy, none of which seems to include same gender couples

🏳️‍🌈🎨💻📢 Happy to share our workshop study on queer artists’ experiences critically engaging with GenAI

Looking forward to presenting this work at #FAccT2025 and you can read a pre-print here:
arxiv.org/abs/2503.09805

11 months ago 27 4 2 0
Post image

When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs:

🧵1/9

10 months ago 72 21 2 2
Advertisement
An overview of the work “Research Borderlands: Analysing Writing Across Research Cultures” by Shaily Bhatt, Tal August, and Maria Antoniak. The overview describes that We  survey and interview interdisciplinary researchers (§3) to develop a framework of writing norms that vary across research cultures (§4) and operationalise them using computational metrics (§5). We then use this evaluation suite for two large-scale quantitative analyses: (a) surfacing variations in writing across 11 communities (§6); (b) evaluating the cultural competence of LLMs when adapting writing from one community to another (§7).

An overview of the work “Research Borderlands: Analysing Writing Across Research Cultures” by Shaily Bhatt, Tal August, and Maria Antoniak. The overview describes that We survey and interview interdisciplinary researchers (§3) to develop a framework of writing norms that vary across research cultures (§4) and operationalise them using computational metrics (§5). We then use this evaluation suite for two large-scale quantitative analyses: (a) surfacing variations in writing across 11 communities (§6); (b) evaluating the cultural competence of LLMs when adapting writing from one community to another (§7).

🖋️ Curious how writing differs across (research) cultures?
🚩 Tired of “cultural” evals that don't consult people?

We engaged with interdisciplinary researchers to identify & measure ✨cultural norms✨in scientific writing, and show that❗LLMs flatten them❗

📜 arxiv.org/abs/2506.00784

[1/11]

10 months ago 72 30 1 5

This looks incredible! Thanks for sharing the syllabus!

10 months ago 1 0 0 0
Post image

I’m thrilled to share RewardBench 2 📊— We created a new multi-domain reward model evaluation that is substantially harder than RewardBench, we trained and released 70 reward models, and we gained insights about reward modeling benchmarks and downstream performance!

10 months ago 22 6 2 1
Wisconsin-Madison's tree-filled campus, next to a big shiny lake

Wisconsin-Madison's tree-filled campus, next to a big shiny lake

A computer render of the interior of the new computer science, information science, and statistics building. A staircase crosses an open atrium with visibility across multiple floors

A computer render of the interior of the new computer science, information science, and statistics building. A staircase crosses an open atrium with visibility across multiple floors

I'm joining Wisconsin CS as an assistant professor in fall 2026!! There, I'll continue working on language models, computational social science, & responsible AI. 🌲🧀🚣🏻‍♀️ Apply to be my PhD student!

Before then, I'll postdoc for a year in the NLP group at another UW 🏔️ in the Pacific Northwest

11 months ago 145 14 16 3
Post image

When interacting with ChatGPT, have you wondered if they would ever "lie" to you? We found that under pressure, LLMs often choose deception. Our new #NAACL2025 paper, "AI-LIEDAR ," reveals models were truthful less than 50% of the time when faced with utility-truthfulness conflicts! 🤯 1/

11 months ago 25 9 1 3
A bar plot comparing the storytelling rates for different topics in the example dataset of congressional speeches. There are often large differences between storytelling and non-storytelling for individual topics. For example, the topic whose top words read "NUM, years, service, great, state" has much more storytelling that non-storytelling.

A bar plot comparing the storytelling rates for different topics in the example dataset of congressional speeches. There are often large differences between storytelling and non-storytelling for individual topics. For example, the topic whose top words read "NUM, years, service, great, state" has much more storytelling that non-storytelling.

The top five congressional speeches for the topic "NUM, years, service, great state." All of the documents honor the lives of important people.

The top five congressional speeches for the topic "NUM, years, service, great state." All of the documents honor the lives of important people.

I updated our 🔭StorySeeker demo. Aimed at beginners, it briefly walks through loading our model from Hugging Face, loading your own text dataset, predicting whether each text contains a story, and topic modeling and exploring the results. Runs in your browser, no installation needed!

1 year ago 20 6 1 0