Manfred Diaz (@manfreddiaz) Bsky

New paper: “𝐀 𝐓𝐡𝐞𝐨𝐫𝐲 𝐨𝐟 𝐀𝐩𝐩𝐫𝐨𝐩𝐫𝐢𝐚𝐭𝐞𝐧𝐞𝐬𝐬 𝐓𝐡𝐚𝐭 𝐀𝐜𝐜𝐨𝐮𝐧𝐭𝐬 𝐟𝐨𝐫 𝐍𝐨𝐫𝐦𝐬 𝐨𝐟 𝐑𝐚𝐭𝐢𝐨𝐧𝐚𝐥𝐢𝐭𝐲”

Agent-based models of social order work better when agents act by predictive pattern completion from prefix (culture/context) to suffix (action) than when they act through expected value maximization

4 weeks ago 34 11 4 1

belated happy birthday, Marc!

2 months ago 1 0 0 0

Hello all! 👋

I’m delighted to share a 🚨 new preprint 🚨:

“Active Evaluation of General Agents: Problem Definition and Comparison of Baseline Algorithms”.

A paper thread! 🤩📄🧵 1/N

3 months ago 47 11 2 2

Merry Christmas! ☃️🌲

3 months ago 3 0 1 0

Maybe the general intelligence has always been behind the algorithm or the prompt? No publicly available eval seems to be safe from researchers overfitting.

4 months ago 0 0 0 0

It hasn't disappointed thus far!

6 months ago 0 0 0 0

@sharky6000.bsky.social this may be of interest!

8 months ago 4 0 1 0

I was following this one during the COVID pandemic, but it has been inactive for quite some time. The original talks' recordings are amazing, though!

10 months ago 1 0 1 0

Yeah, it's been a period for all of us simultaneously! I have also been pretty busy with thesis/job search. Hopefully, it will be back running in the Fall term!

10 months ago 1 0 0 0

@aamasconf.bsky.social 2025 was very special for us! We had the opportunity. to present a tutorial on general evaluation of AI agents, and we got a best paper award! Congrats, @sharky6000.bsky.social and the team! 🎉

10 months ago 13 1 0 0

A Tutorial on General Evaluation of AI Agents Artificial Intelligence (AI) and machine learning (ML), in particular, have emerged as scientific disciplines concerned with understanding and building single and multi-agent systems with the ability ...

In the afternoon we will be giving a tutorial on general evaluation of AI agents.

sites.google.com/view/aamas20... 10/N

11 months ago 4 1 1 0

Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt Artificial Intelligence (AI) systems are increasingly placed in positions where their decisions have real consequences, e.g., moderating online spaces, conducting research, and advising on policy. Ens...

Announcing our latest arxiv paper:

Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt
arxiv.org/abs/2505.05197

We argue for a view of AI safety centered on preventing disagreement from spiraling into conflict.

11 months ago 24 6 1 1

Congrats, Seth!

11 months ago 1 0 1 0

Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt — LessWrong We can just drop the axiom of rational convergence.

First LessWrong post! Inspired by Richard Rorty, we argue for a different view of AI alignment, where the goal is "more like sewing together a very large, elaborate, polychrome quilt", than it is "like getting a clearer vision of something true and deep"
www.lesswrong.com/posts/S8KYwt...

11 months ago 6 1 3 0

The quality of London's museums is just amazing! Enjoy!

1 year ago 3 0 0 0

A Theory of Appropriateness with Applications to Generative Artificial Intelligence YouTube video by MITCBMM

In case folks are interested, here's a video of a talk I gave at MIT a couple weeks ago: youtu.be/FmN6fRyfcsY?...

1 year ago 8 3 0 0

Our new evaluation method, Soft Condorcet Optimization is now available open-source! 👍

Both the sigmoid (smooth Kendall-tau) and Fenchel-Young (perturbed optimizers) versions.

Also, an optimized C++ implementation that is ~40X faster than the Python one. 🤩⚡

github.com/google-deepm...

1 year ago 16 3 0 1

SCaLA-25 A workshop connecting research topics in social choice and learning algorithms.

Working at the intersection of social choice and learning algorithms?

Check out the 2nd Workshop on Social Choice and Learning Algorithms (SCaLA) at @ijcai.bsky.social this summer.

Submission deadline: May 9th.

I attended last year at AAMAS and loved it! 👍

sites.google.com/corp/view/sc...

1 year ago 19 6 0 2

If the AAMAS website is a good reference for this, it may not be, but uncertain atm.

1 year ago 1 0 1 0

Come to understand ML evaluation from first principles! We have put together a great AAMAS tutorial covering statistics, probabilistic models, game theory, and social choice theory.

Bonus: a unifying perspective of the problem leveraging decision-theoretic principles!

Join us on May 19th!

1 year ago 6 1 1 0

Re #2: The key finding there is that the stationary points of SCO contain the margin matrix but, as I said in the note, there is still more work to do!

1 year ago 1 0 1 0

Thanks! I have been meaning to update the manuscript to standalone without the main paper but instead I may have change the content to a different format 😉. Coming soon!

1 year ago 1 0 2 0

Ah, I see the confusion... I never used the "identically distributed assumption," only the independence assumption (from 8 to 9).

1 year ago 1 0 0 0

I'm not sure if I understood your question correctly, but yes? As the post you shared says, "Voila! We have shown that minimizing the KL divergence amounts to finding the maximum likelihood estimate of θ." Maybe I am missing your point 😬

1 year ago 0 0 2 0

Elo drives most LLM evaluations, but we often overlook its assumptions, benefits, and limitations. While working on SCO, we wanted to understand the SCO-Elo distinction, so I looked and uncovered some intriguing findings and documented them in these notes. I hope you find them valuable!

1 year ago 2 1 0 0

Looking for a principled evaluation method for ranking of *general* agents or models, i.e. that get evaluated across a myriad of different tasks?

I’m delighted to tell you about our new paper, Soft Condorcet Optimization (SCO) for Ranking of General Agents, to be presented at AAMAS 2025! 🧵 1/N

1 year ago 66 17 1 6

I had the convexity results for the online pairwise update (Section B.1.1.1) in my notes (manfreddiaz.github.io/assets/pdf/s...), but it is not clear to me if they hold for the other non-online settings. Worth taking a more detailed pass over the paper!

1 year ago 2 0 0 0

That's a nice finding, @sacha2.bsky.social! @sharky6000.bsky.social I skimmed over it, and it seems neat! There is an important distinction, though. They work with the "online" Elo regime, departing from the traditional gradient/batch gradient descent updates. (e.g., FIDE doesn't use online updates)

1 year ago 2 0 1 0

lol 😀

1 year ago 3 0 0 0

Michael I. Jordan - Wikipedia

Not that Michael Jordan, but this one en.wikipedia.org/wiki/Michael...

1 year ago 3 0 1 0

Posts by Manfred Diaz