Aaron Roth (@aaroth) Bsky

https://www.cics.umass.edu/events/research-ai-era-seminar-aaron-roth

I've recently been getting invitations to talk about how to use AI tools to assist with TCS research. Its something I've been doing a lot, but don't have structured thoughts about how to explain process. But I'm going to try -- first such talk is tomorrow: t.co/wlHPBzXzDm

3 days ago 7 2 0 0

AI Agents like Codex are very good at figuring out taxes, including obscure local ones that Intuit doesn't bother with (looking at you, Philadelphia local taxes). Businesses that provide financial/legal services that involve reasoning through dense but public documentation are in trouble.

5 days ago 5 0 0 0

Announcing the #ICML2026 tutorials!

All ten tutorials will be presented the first day of the conference, Monday July 6.

Read the blog post for more details on the selection process!

blog.icml.cc/2026/04/02/a...

2 weeks ago 12 8 0 1

So many interesting things here. (N.b. I get to think ab interfaces for this this all day long at work :))

One thing I find interesting here is how similar the real work of science is to that of the humanities, both of which are centered around human judgement ab what is relevant and interesting.

1 month ago 5 1 0 0

Alpha_0 joke from Dogman

1 month ago 2 0 0 0

Very cool work. Empirical science has many researcher-degrees-of-freedom which makes it hard to interpret specific studies --- these are only a single trajectory through the data analysis multiverse. Human researchers are opaque. But with agents you can explore the whole space!

1 month ago 3 0 0 0

My favorite translation use case is from informal english to precise mathematics.

1 month ago 9 2 0 0

The paper is here: arxiv.org/abs/2602.23360 and is joint work with Eric Eaton, @surbhigoel.bsky.social, @marcelhussing.bsky.social, @mkearnsphilly.bsky.social, @sikatasengupta.bsky.social and @optimistsinc.bsky.social

1 month ago 5 1 0 0

Via this simple argument, we get out-of-the-box agreement theorems about neural networks (parameterized by size), trees (parameterized by depth), gradient boosting (parameterized by iterations), and stacking (parameterized by ensemble size). No distributional assumptions needed.

1 month ago 1 0 1 0

No matter what the Bayes error is, no matter how complicated the problem, the learning curve is monotonically decreasing and bounded above and below. So no matter what its shape is, it can't avoid "flatness" for a long stretch. Whenever you get flatness you get agreement.

1 month ago 0 0 1 0

This lets us control out-of-the-box disagreement by the "local learning curve" --- how much can you decrease error by doubling the complexity of your model class? If the answer is not much, then you get out of the box agreement. The magic is that this is guaranteed to happen.

1 month ago 0 0 1 0

The more interesting case: most model classes are not convex. But we can parameterize a family of model classes by a measure of complexity: size, depth, etc. Usually, the average of two models of complexity n is a model in the same class but with higher complexity, say 2n.

1 month ago 0 0 1 0

An easy case: If our shared model class is convex, then the average of our two models is also in our model class. So if both of us trained to approximate optimality within our model class, there just isn't room for improvement within the same class; we have to mostly agree.

1 month ago 0 0 1 0

But we can think about it the other way around. What if I imagined the ensemble of your model and mine. If this imagined ensemble would not substantially decrease error, then it must be that your model and mine already mostly agree. When does averaging not reduce error?

1 month ago 0 0 1 0

The "Ambiguity Decomposition" is a fact that comes from ensembling theory. It says that if I average two models, the error of the ensemble is equal to the average of the model errors, less the "disagreement" of the two models. The goal is usually low error for the ensemble.

1 month ago 0 0 1 0

Neural networks are highly non-convex, so approximate error minimizers need not look anything like each other in parameter space. But we show that nevertheless (for many model sizes) approximate error minimizers must closely agree in function/prediction space despite this!

1 month ago 9 7 1 0

How AI is changing the nature of mathematical research What machine learning theorists learned using AI agents to generate proofs — and what comes next.

Michael @mkearnsphilly.bsky.social ) and I wrote a blog post about our experiences using AI for research, and our thoughts on what these developments will mean for research, publication, and education: www.amazon.science/blog/how-ai-...

1 month ago 30 13 1 3

Agree its a necessary component, just not a sufficient one. As AI progresses the problem of it producing incorrect proofs will diminish, but that doesn't solve the problem of overloading our publication venues beyond their capacity to serve as effective attention filters.

1 month ago 2 0 0 0

How AI is changing the nature of mathematical research What machine learning theorists learned using AI agents to generate proofs — and what comes next.

AI is bringing a sea change in scientific research methodology, training, and peer review. Amazon Scholars and Penn professors @mkearnsphilly.bsky.social and @aaroth.bsky.social on what agentic AI tools mean for the next generation of researchers.

1 month ago 5 2 0 0

How AI is changing the nature of mathematical research What machine learning theorists learned using AI agents to generate proofs — and what comes next.

Michael @mkearnsphilly.bsky.social ) and I wrote a blog post about our experiences using AI for research, and our thoughts on what these developments will mean for research, publication, and education: www.amazon.science/blog/how-ai-...

1 month ago 30 13 1 3

Which is the better model for math? GPT 5.2, or GPT 5.3 codex (either one on high reasoning)?

1 month ago 2 0 1 0

Sometimes you gotta split the difference. From Aaron Roth's (@aaroth.bsky.social) plenary talk at #ALT2026

1 month ago 18 4 0 0

Did you just miss punching your ticket to Rio or Salt Lake City? Wanna go to a conference where people will engage with you and your paper on foundations of responsible computing, and you won't get lost in the crowd?

Submit to #FORC2026, in Boston in June! Deadline in 2 weeks.

2 months ago 3 3 0 1

The other paper accepted to @iclr-conf.bsky.social 2026 🇧🇷. Our work on replicable RL sheds some light on how to consistently make decisions in RL.

@ericeaton.bsky.social @mkearnsphilly.bsky.social @aaroth.bsky.social @sikatasengupta.bsky.social @optimistsinc.bsky.social

2 months ago 13 5 0 0

I try to avoid posting about politics here, but I feel compelled to say some things that should be obvious: 🧵

2 months ago 17 2 1 0

History of the U.S. National Science Foundation (NSF)

The NSF has played a key role in American science, and risks being collateral damage in the war against science.
#econsky #academicsky #NSF #science
marketdesigner.blogspot.com/2026/01/hist...

3 months ago 20 10 0 1

The paper is here: arxiv.org/abs/2601.05245 Its joint work with @ncollina.bsky.social, Jiuyao Lu, and George Noarov. Natalie and George are on the job market --- check them out. www.seas.upenn.edu/~ncollina/ noarov.com

3 months ago 4 2 1 0

Gemini provides automated feedback for theoretical computer scientists at STOC 2026

Not sure of the details, but I believe its related to the experiment that STOC ran giving feedback with a version of Gemini Deep Think which got generally postiive reviews for critiquing math research.google/blog/gemini-...

3 months ago 1 0 0 0

Whats wrong with providing access to a fancy LLM to give feedback to authors about their own papers?

3 months ago 0 0 1 0

But we ended up showing that this is impossible in generality. The results in the paper also lay out a slightly more nuanced landscape, and there remain some interesting open questions about the power of reductions from multicalibration to marginal calibration. Take a look!

3 months ago 1 0 0 0

Posts by Aaron Roth