Raj Movva (@rajmovva) Bsky

New paper! The Linear Representation Hypothesis is a powerful intuition for how language models work, but lacks formalization. We give a mathematical framework in which we can ask and answer a basic question: how many features can be stored under the hypothesis? 🧵 arxiv.org/abs/2602.11246

2 months ago 43 14 1 2

Introducing AI to an Online Petition Platform Changed Outputs but not Outcomes The rapid integration of AI writing tools into online platforms raises critical questions about their impact on content production and outcomes. We leverage a unique natural experiment on Change$.$org...

Excited to share a new working paper!

What happened when Change.org integrated an AI writing tool into their platform? We provide causal evidence that petition text changed significantly while outcomes did not improve. 1/

arxiv.org/abs/2511.13949

4 months ago 53 18 4 6

Why Can’t the N.B.A. Move On from Its Old Stars? Even as the league drastically evolves, the narratives around it are still orbiting its aging icons.

another banger from @louisathomas.bsky.social

www.newyorker.com/sports/sport...

5 months ago 2 0 0 0

New #NeurIPS2025 paper: how should we evaluate machine learning models without a large, labeled dataset? We introduce Semi-Supervised Model Evaluation (SSME), which uses labeled and unlabeled data to estimate performance! We find SSME is far more accurate than standard methods.

6 months ago 21 7 1 4

I am on the job market this year! My research advances methods for reliable machine learning from real-world data, with a focus on healthcare. Happy to chat if this is of interest to you or your department/team.

6 months ago 28 12 2 4

How Chatbots and AI Are Already Transforming Kids' Classrooms Educators across the country are bringing chatbots into their lesson plans. Will it help kids learn or is it just another doomed ed-tech fad?

I've been working for many months on this article on Silicon Valley's under-the-radar role in bringing AI into schools across the US. I really hope you'll read it — here's a gift link — but I'll tell you some of the highlights in this thread. (1/x)

7 months ago 111 59 5 10

🚨 New postdoc position in our lab at Berkeley EECS! 🚨

(please reshare)

We seek applicants with experience in language modeling who are excited about high-impact applications in the health and social sciences!

More info in thread

1/3

8 months ago 22 12 1 3

What a crossover!

8 months ago 2 0 0 0

This is great, & there's clear analogy to the burgeoning mechanism design community for AI alignment: who is providing RLHF votes? Do their preferences reflect yours? Discussions about social choice and collective constitutions are interesting, but "what and who is in the data" is just as important.

8 months ago 4 0 0 0

This is amazing

8 months ago 0 0 0 0

They're in their move fast and break things era 🙃

8 months ago 2 0 0 0

Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts While sparse autoencoders (SAEs) have generated significant excitement, a series of negative results have added to skepticism about their usefulness. Here, we establish a conceptual distinction that r...

This take emerged organically from just how well our method on SAEs for hypothesis generation (HypotheSAEs) performed, which surprised all of us!

See the paper arxiv.org/abs/2506.23845

Thanks @kennypeng.bsky.social, Jon, @emmapierson.bsky.social, @nkgarg.bsky.social for another nice collaboration.

8 months ago 3 0 0 0

This capability of discovering unknown concepts opens many opportunities for applied machine learning. We can design better whitebox predictors, better audit high-stakes models for bias, and generate hypotheses for CSS research. More broadly, SAEs can help bridge the "prediction-explanation" gap.

8 months ago 1 0 1 0

These tasks lie in contrast to probing, where we're trying to predict the presence of a *known* concept; and steering, where we're trying to include a *known* concept in an LLM output. SAEs lose to simple baselines on these tasks. (2 good papers on this: "AxBench" and Kantamneni, Engels et al. 2025)

8 months ago 0 0 1 0

How do we reconcile our view with recent negative results? Our key distinction is that SAEs are useful when you don't know what you're looking for: how does my text classifier predict which headlines will go viral? How does my LLM perform addition? These are "unknown unknowns".

8 months ago 0 0 1 0

📢New POSITION PAPER: Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts

Despite recent results, SAEs aren't dead! They can still be useful to mech interp, and also much more broadly: across FAccT, computational social science, and ML4H. 🧵

8 months ago 41 4 1 3

Annotation alignment: Comparing LLM and human annotations of conversational safety Rajiv Movva, Pang Wei Koh, Emma Pierson. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024.

Nice work! Cool to see that item difficulty predicts human-llm disagreement. We also studied similar questions with the DICES dataset: aclanthology.org/2024.emnlp-m...

9 months ago 1 0 0 0

Heat map showing that more accurate models have more correlated errors.

Are LLMs correlated when they make mistakes? In our new ICML paper, we answer this question using responses of >350 LLMs. We find substantial correlation. On one dataset, LLMs agree on the wrong answer ~2x more than they would at random. 🧵(1/7)

arxiv.org/abs/2506.07962

9 months ago 49 7 1 2

Individual experiences and collective evidence Jessica Dai on theory for the world as it could be

@jessica.bsky.social on individual reporting as a means to build collective knowledge.

9 months ago 8 2 1 0

ARR question: If I submit to a cycle, how long do those reviews "last"? e.g. if I submit to the July cycle but can't go to AACL, can I commit my July reviews to the conference associated with the next (October) cycle? @aclrollingreview.bsky.social

10 months ago 2 1 1 0

A gif explaining the value of test-time augmentation to conformal classification. The video begins with an illustration of TTA reducing the size of the predicted set of classes for a dog image, and goes on to explain that this is because TTA promotes the true class's predicted probability to be higher, even when it's predicted to be unlikely.

New work 🎉: conformal classifiers return sets of classes for each example, with a probabilistic guarantee the true class is included. But these sets can be too large to be useful.

In our #CVPR2025 paper, we propose a method to make them more compact without sacrificing coverage.

10 months ago 22 6 3 1

I would like to spend up to 5-10 hours to learn about basic macroeconomics (I know it's maybe fake, but setting that aside for a moment...). Does anyone have any recommendations?

10 months ago 0 0 0 0

Huge congrats, Marianne!!

10 months ago 1 0 1 0

I find that I've actually gone out of my way to stop using bullet points in reviews now because Any Review With Bullet Points is a Bot 🥲

10 months ago 1 0 0 0

People love to hate on the transition 3-pointer as evidence of how the 3 has ruined basketball, but I think it's usually just the right play... if you have numbers in transition, your teammate can easily get a putback off a miss, so might as well try the 3

11 months ago 1 0 0 0

We'll present HypotheSAEs at ICML this summer! 🎉
Draft: arxiv.org/abs/2502.04382

We're continuing to cook up new updates for our Python package: github.com/rmovva/Hypot...

(Recently, "Matryoshka SAEs", which help extract coarse and granular concepts without as much hyperparameter fiddling.)

11 months ago 10 2 1 0

So awesome, congrats Lucy!!! 🧀

11 months ago 3 0 0 0

These Warriors are old, tired and in trouble as Game 7 looms against Rockets They're not done yet. Maybe a legendary performance awaits on Sunday. But the Warriors look like they're out of gas and out of answers.

Yesterday's Game 6 was depressing, and this article precisely delineated the reasons why. And sometimes, a precise retelling of what you're feeling is all you need to feel better. www.nytimes.com/athletic/633... @thompsonscribe.bsky.social

11 months ago 2 0 0 0

Did you take the hot air balloon pic?!

11 months ago 1 0 1 0

Check out Erica's nice work. They not only develop a well-grounded model for disparities in disease progression, but also conduct experiments with real NYP cardiology data! (Anyone who works in healthcare knows how much of a feat it is to use data other than MIMIC)

11 months ago 4 0 0 0

Posts by Raj Movva