Advertisement · 728 × 90

Posts by Rupak

LLMs didn’t move language modeling research from linguists to AI people, they just moved it from computer scientists who thought language was interesting to computer scientists who thought language was boring

4 months ago 83 13 4 1
Post image

AI is already at work in American newsrooms.

We examine 186k articles published this summer and find that ~9% are either fully or partially AI-generated, usually without readers having any idea.

Here's what we learned about how AI is influencing local and national journalism:

6 months ago 56 29 5 2
Predoctoral Research Assistant (Contract) – Computational Social Science - Microsoft Research Are you a recent college graduate wishing to gain research experience prior to pursuing a Ph.D. in fields related to computational social science (CSS)? Do you have a deep love of “playing with data”—...

Do you have strong programming skills but need research experience doing meaningful & exciting CSS projects before heading off to a top graduate school for computational social science PhD? Apply now to predoc with me,
@dggoldst.bsky.social @jakehofman.bsky.social www.microsoft.com/en-us/resear...

9 months ago 10 9 2 1
Screenshot of first page of paper. It is here: https://arxiv.org/pdf/2507.00828

Abstract: Topic model and document-clustering evaluations either use automated metrics that align poorly with human preferences or require expert labels that are intractable to scale. We design a scalable human evaluation protocol and a corresponding automated approximation that reflect practitioners' real-world usage of models. Annotators -- or an LLM-based proxy -- review text items assigned to a topic or cluster, infer a category for the group, then apply that category to other documents. Using this protocol, we collect extensive crowdworker annotations of outputs from a diverse set of topic models on two datasets. We then use these annotations to validate automated proxies, finding that the best LLM proxies are statistically indistinguishable from a human annotator and can therefore serve as a reasonable substitute in automated evaluations

Screenshot of first page of paper. It is here: https://arxiv.org/pdf/2507.00828 Abstract: Topic model and document-clustering evaluations either use automated metrics that align poorly with human preferences or require expert labels that are intractable to scale. We design a scalable human evaluation protocol and a corresponding automated approximation that reflect practitioners' real-world usage of models. Annotators -- or an LLM-based proxy -- review text items assigned to a topic or cluster, infer a category for the group, then apply that category to other documents. Using this protocol, we collect extensive crowdworker annotations of outputs from a diverse set of topic models on two datasets. We then use these annotations to validate automated proxies, finding that the best LLM proxies are statistically indistinguishable from a human annotator and can therefore serve as a reasonable substitute in automated evaluations

Evaluating topic models (and document clustering methods) is hard. In fact, since our paper critiquing standard evaluation practices four years ago, there hasn't been a good replacement metric

That ends today (we hope)! Our new ACL paper introduces an LLM-based evaluation protocol 🧵

9 months ago 53 10 3 2
Preview
A Co-op for Computing Faculty are diving into the exciting, data-crunching, AI world of GPMoo.

Honored by the feature on my research, grant, and GPU cluster by the Williams magazine. today.williams.edu/magazine/a-c...

10 months ago 9 1 0 0
A screenshot of a paper showing the title - "Pairscale: Analyzing Attitude Change in Online Communities" by Rupak Sarkar, Patrick Wu, Kristina Miler, Alexander Hoyle and Philip Resnik.

A screenshot of a paper showing the title - "Pairscale: Analyzing Attitude Change in Online Communities" by Rupak Sarkar, Patrick Wu, Kristina Miler, Alexander Hoyle and Philip Resnik.

Are you tired of using traditional stance detection to measure the polarity of text? Our #NAACL25 paper proposes an approach that uses pairwise comparisons to order texts on a continuous scale, capturing both implicit and explicit evidence in language.

📍Today in Hall 3 from 4-5:30pm

Come say hi!

11 months ago 6 1 0 0

Yes! At Session 1 at Hall 3 tomorrow 4-5.30 PM (CSS track)

11 months ago 1 0 0 0

Hi Maria! At NAACL too, let’s catch up!

11 months ago 1 0 1 0
Advertisement

Check out Neha’s Outstanding Paper Award 🏆 winning research on atomic hypothesis decomposition in Session C at 2 pm today!!

#NAACL2025

11 months ago 4 0 0 0
Post image

I'll be presenting this work with @rachelrudinger at #NAACL2025 tomorrow (Wednesday 4/30) in Albuquerque during Session C (Oral/Poster 2) at 2pm! 🔬

Decomposing hypotheses in traditional NLI and defeasible NLI helps us measure various forms of consistency of LLMs. Come join us!

11 months ago 8 3 5 1
Post image

🚨 New Paper 🚨

1/ We often assume that well-written text is easier to translate ✏️

But can #LLMs automatically rewrite inputs to improve machine translation? 🌍

Here’s what we found 🧵

1 year ago 8 4 1 0

I have so many questions. Why not write a Python script that does the same thing instead of writing a react app. Why not just answer with 3?!

1 year ago 0 0 0 0

So you're saying the next iteration of the model might generate a video of a person counting the number of R's in strawberry to tell me the answer? :P

1 year ago 0 0 1 0
React app made by claude that highlights the r's in strawberry

React app made by claude that highlights the r's in strawberry

I asked Claude 3.7 to count the number of r's in Strawberry ("count the number of r's in strawberry for me") and it wrote a react app that displays a Strawberry, and you click the strawberry to enumerate the number of r's.

Wild. Wondering what kind of alignment policies led to this.

1 year ago 5 0 2 0