Sagar Kumar (@diesagar) Bsky

Large Language Models Reproduce Racial Stereotypes When Used for Text Annotation Large language models (LLMs) are increasingly used for automated text annotation in tasks ranging from academic research to content moderation and hiring. Across 19 LLMs and two experiments totaling m...

AI systems are now making decisions about you — who gets hired, whose content gets removed, whose voice gets heard.

My new paper finds something concerning:

LLMs infer your ethnic identity from subtle textual cues…

… and use it to discriminate against you.

🧵 1/

1 month ago 26 10 1 1

Great new preprint by a student in my lab, demonstrating how many benchmarks and safeguards of LLMs are ill-conceived and unreliable. Excellent thread breaking down the core findings.

arxiv.org/abs/2603.23485

3 weeks ago 43 13 0 1

Failure of contextual invariance in gender inference with large language models Standard evaluation practices assume that large language model (LLM) outputs are stable under contextually equivalent formulations of a task. Here, we test this assumption in the setting of gender inf...

MASSIVE thank you to the brilliant @ariel-flint.bsky.social @lajello.bsky.social and @baronca.bsky.social for their help on this work!! Check out the preprint if you want to learn more: arxiv.org/abs/2603.23485

#LingSky #ResponsibleAI #AI #NLP #MachineLearning

3 weeks ago 9 3 0 0

Failure of contextual invariance in gender inference with large language models Standard evaluation practices assume that large language model (LLM) outputs are stable under contextually equivalent formulations of a task. Here, we test this assumption in the setting of gender inf...

MASSIVE thank you to the brilliant @ariel-flint.bsky.social @lajello.bsky.social and @baronca.bsky.social for their help on this work!! Check out the preprint if you want to learn more: arxiv.org/abs/2603.23485

#LingSky #ResponsibleAI #AI #NLP #MachineLearning

3 weeks ago 9 3 0 0

Given these findings, we may have to start asking more questions about what our evaluations are actually measuring.

3 weeks ago 4 0 1 0

Not only do we find that this minor change can completely remove or reverse model #bias, we also find that irrelevant pronoun becomes by far the most informative feature, and that this dependence is irreducible and cannot be described as simple pronoun repetition.

3 weeks ago 6 1 1 0

Gender biases which we often assume to leak into the model through training completely vanish when an irrelevant character in the discourse is referred to by “he” instead of “she”.

3 weeks ago 7 1 1 0

When we slightly enrich the discourse with a single sentence, we find that changing a single pronoun that is completely uninformative towards the task completely changes model behavior.

3 weeks ago 10 3 1 0

Turns out that context definitely shapes LLM responses—even when it shouldn’t.

3 weeks ago 6 0 1 0

In real-life settings, this is rarely the case—most language use, whether it’s with people or with generative language models includes *context*.

#Pragmatics 101 tells us that we are always looking to the relevant parts of the discourse when we interpret what is being said. Do LLMs do the same?

3 weeks ago 5 1 1 0

With AI being deployed in just about every industry imaginable, a lot of time and energy is put into evaluating #LLMs to make sure the systems we deploy are behaving the way we want them to. These evaluations and benchmarks, however, are usually done using single, isolated sentences.

3 weeks ago 3 1 1 1

Are our evaluations actually measuring any stable properties of LLMs? 🧵

3 weeks ago 11 4 1 1

Finally learning Montague Grammar and it has unironically brought me to tears because of how convoluted and ridiculous and well thought out and elegant it is #lingsky #semantics

1 month ago 0 0 0 0

"Not evil in some movie sense. Silent partners in a real world sense." well damn

2 months ago 543 183 14 5

telling myself that if i actually learned what a sheaf is then i would be far too powerful and that’s why it continues to evade my understanding

3 months ago 1 0 0 0

say what you want about the findings, i can’t get over the fact that they have an interpolated line connecting the dots in a categorical plot 😭

5 months ago 3 2 0 0

Thrilled to see this out! Congrats to collaborators and please give it a read. Since we wrote this, it’s only become more pressing and timely.

5 months ago 3 0 0 0

Locating the Asymmetry in Information Flow between Local and National Media on Transgender Discourses - Bulletin of Applied Transgender Studies Mainstream news outlets set the agenda and terms of discussion for public discourse. As transgender people experience increasingly vitriolic attacks on their fundamental rights in the US, understandin...

hot off the (virtual) press in @transstudies.bsky.social: a collaboration with @pranavgoel.bsky.social, @crazybrokeasian.bsky.social, and @diesagar.bsky.social on information flows in US local & national news about transgender people (spoiler alert: it’s complicated)

doi.org/10.57814/557...

5 months ago 10 9 1 2

i’m blue da boo dee da boo da john cho please save us

7 months ago 0 0 0 0

the state of media right now has me feeling like frankie muniz in big fat liar

7 months ago 1 0 1 0

calling him a "conservative activist" is... a fascinating framing

7 months ago 2 0 0 0

Northern Ireland man from Lisburn kidnapped by Trump’s Ice raids because he “looked like a Mexican”. He had the correct visa, all his documents. He’s never been in trouble with police.

Held in terrible conditions. A fellow captive couldn’t get his medication and dropped dead of a heart attack

7 months ago 2918 1722 64 94

ICE released this Mass. mom with no phone, 30 miles from home in the rain after detainment for a sealed marijuana conviction Federal authorities refused to tell the Canton woman's husband or lawyer on what grounds she was being detained until shortly before her release.

“Jimenez Rosa, a legal permanent resident and mother of four U.S. citizens, was detained over what her lawyer believes was a decades-old, personal-use marijuana charge, which is no longer a crime in Massachusetts today… “I was just like, ‘Girls, we might never see your mother again in this country’”

7 months ago 3318 1597 9 233

Boston’s Logan Airport clearly has a preference when it comes to fundamental particles because I haven’t seen a Fermion sign anywhere

8 months ago 4 0 0 0

Just published my first little bit of science journalism on this great article by @kennysmithed.bsky.social and Jennifer Culbertson about whether communicative efficiencies in language (in this case, differential case marking) are learned or acquired through use. Please give it a look!!

8 months ago 1 0 0 0

Screenshot reading: Waves of Attention to Racial Injustice on Social Media: Extrajudicial Police Killings in the United States as Focusing Events Authors: Annie Waldherr, Nicola Righetti, Ryan J. Gallager, Kira Klinger, Daniela Stoltenberg, Sagar Kumar, Dominic Ridley, and Brooke Foucault Welles Abstract: The deaths of Black victims of police brutality, such as George Floyd, Breonna Taylor, Sandra Bland, and Philando Castile, have become focusing events and symbols for the Black Lives Matter (BLM) movement, catalyzing wide-spread public attention to racial injustice. While prior studies on hashtag activism predominantly focus on single and widely known cases, less is understood about why some incidents draw massive public attention while others do not. Addressing this gap, our study investigates the factors influencing the likelihood and size of public attention on Twitter (now X) following extrajudicial police killings. We analyzed 1.5 million tweets in response to 795 police killings between January 1, 2015, and December 8, 2016, in the United States. By examining cases on all scales, from unnoticed to prominent, we provide large-scale empirical evidence on disparities in public attention to police killings and their victims. Results indicate two distinct processes in the emergence of focusing events: While victims’ attributes such as race, age, and gender increased likelihood of receiving any attention (thresholding), variables of context and social construction were related to overall wave size (focusing).

Second, with a large team of authors, led by @anniewald.bsky.social, we study which (personal, temporal, spatial, affordance-based) properties of incidences of extrajudicial police killings facilitate public attention on social media.

8 months ago 15 6 1 2

Could be more proud to see this publication out!!! So grateful for the wonderful team. Please give it a look!!

8 months ago 6 1 0 0

William (Bill) Labov (1927–2024) | Language in Society | Cambridge Core William (Bill) Labov (1927–2024) - Volume 54 Issue 3

Absolutely beautiful obituary for Bill Labov, describing a life rich in insights and in humanity www.cambridge.org/core/journal...

8 months ago 67 28 0 4

LLM research for the last few years has been constantly saying "I can't put this in the paper--nobody would use a model that just predicts the next token to make decisions about [insert life-altering scenario here]" then seeing an article titled "Can AI help make decisions about nuclear warfare?"

8 months ago 0 0 0 0

This is the closest thing i’ve come to a noumenal experience.

i have literally no reference for it. it is only itself-in-itself.

every part of it contradicts itself in so many ways that the only thing left is The Lacanian Real. The Truth.

9 months ago 1 0 0 0

Posts by Sagar Kumar