New paper: โ๐ ๐๐ก๐๐จ๐ซ๐ฒ ๐จ๐ ๐๐ฉ๐ฉ๐ซ๐จ๐ฉ๐ซ๐ข๐๐ญ๐๐ง๐๐ฌ๐ฌ ๐๐ก๐๐ญ ๐๐๐๐จ๐ฎ๐ง๐ญ๐ฌ ๐๐จ๐ซ ๐๐จ๐ซ๐ฆ๐ฌ ๐จ๐ ๐๐๐ญ๐ข๐จ๐ง๐๐ฅ๐ข๐ญ๐ฒโ
Agent-based models of social order work better when agents act by predictive pattern completion from prefix (culture/context) to suffix (action) than when they act through expected value maximization
Posts by Manfred Diaz
belated happy birthday, Marc!
Hello all! ๐
Iโm delighted to share a ๐จ new preprint ๐จ:
โActive Evaluation of General Agents: Problem Definition and Comparison of Baseline Algorithmsโ.
A paper thread! ๐คฉ๐๐งต 1/N
Merry Christmas! โ๏ธ๐ฒ
Maybe the general intelligence has always been behind the algorithm or the prompt? No publicly available eval seems to be safe from researchers overfitting.
It hasn't disappointed thus far!
@sharky6000.bsky.social this may be of interest!
I was following this one during the COVID pandemic, but it has been inactive for quite some time. The original talks' recordings are amazing, though!
Yeah, it's been a period for all of us simultaneously! I have also been pretty busy with thesis/job search. Hopefully, it will be back running in the Fall term!
@aamasconf.bsky.social 2025 was very special for us! We had the opportunity. to present a tutorial on general evaluation of AI agents, and we got a best paper award! Congrats, @sharky6000.bsky.social and the team! ๐
In the afternoon we will be giving a tutorial on general evaluation of AI agents.
sites.google.com/view/aamas20... 10/N
Announcing our latest arxiv paper:
Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt
arxiv.org/abs/2505.05197
We argue for a view of AI safety centered on preventing disagreement from spiraling into conflict.
Congrats, Seth!
First LessWrong post! Inspired by Richard Rorty, we argue for a different view of AI alignment, where the goal is "more like sewing together a very large, elaborate, polychrome quilt", than it is "like getting a clearer vision of something true and deep"
www.lesswrong.com/posts/S8KYwt...
The quality of London's museums is just amazing! Enjoy!
In case folks are interested, here's a video of a talk I gave at MIT a couple weeks ago: youtu.be/FmN6fRyfcsY?...
Our new evaluation method, Soft Condorcet Optimization is now available open-source! ๐
Both the sigmoid (smooth Kendall-tau) and Fenchel-Young (perturbed optimizers) versions.
Also, an optimized C++ implementation that is ~40X faster than the Python one. ๐คฉโก
github.com/google-deepm...
Working at the intersection of social choice and learning algorithms?
Check out the 2nd Workshop on Social Choice and Learning Algorithms (SCaLA) at @ijcai.bsky.social this summer.
Submission deadline: May 9th.
I attended last year at AAMAS and loved it! ๐
sites.google.com/corp/view/sc...
If the AAMAS website is a good reference for this, it may not be, but uncertain atm.
Come to understand ML evaluation from first principles! We have put together a great AAMAS tutorial covering statistics, probabilistic models, game theory, and social choice theory.
Bonus: a unifying perspective of the problem leveraging decision-theoretic principles!
Join us on May 19th!
Re #2: The key finding there is that the stationary points of SCO contain the margin matrix but, as I said in the note, there is still more work to do!
Thanks! I have been meaning to update the manuscript to standalone without the main paper but instead I may have change the content to a different format ๐. Coming soon!
Ah, I see the confusion... I never used the "identically distributed assumption," only the independence assumption (from 8 to 9).
I'm not sure if I understood your question correctly, but yes? As the post you shared says, "Voila! We have shown that minimizing the KL divergence amounts to finding the maximum likelihood estimate of ฮธ." Maybe I am missing your point ๐ฌ
Elo drives most LLM evaluations, but we often overlook its assumptions, benefits, and limitations. While working on SCO, we wanted to understand the SCO-Elo distinction, so I looked and uncovered some intriguing findings and documented them in these notes. I hope you find them valuable!
Looking for a principled evaluation method for ranking of *general* agents or models, i.e. that get evaluated across a myriad of different tasks?
Iโm delighted to tell you about our new paper, Soft Condorcet Optimization (SCO) for Ranking of General Agents, to be presented at AAMAS 2025! ๐งต 1/N
I had the convexity results for the online pairwise update (Section B.1.1.1) in my notes (manfreddiaz.github.io/assets/pdf/s...), but it is not clear to me if they hold for the other non-online settings. Worth taking a more detailed pass over the paper!
That's a nice finding, @sacha2.bsky.social! @sharky6000.bsky.social I skimmed over it, and it seems neat! There is an important distinction, though. They work with the "online" Elo regime, departing from the traditional gradient/batch gradient descent updates. (e.g., FIDE doesn't use online updates)
lol ๐