Advertisement ยท 728 ร— 90

Posts by Manfred Diaz

Post image

New paper: โ€œ๐€ ๐“๐ก๐ž๐จ๐ซ๐ฒ ๐จ๐Ÿ ๐€๐ฉ๐ฉ๐ซ๐จ๐ฉ๐ซ๐ข๐š๐ญ๐ž๐ง๐ž๐ฌ๐ฌ ๐“๐ก๐š๐ญ ๐€๐œ๐œ๐จ๐ฎ๐ง๐ญ๐ฌ ๐Ÿ๐จ๐ซ ๐๐จ๐ซ๐ฆ๐ฌ ๐จ๐Ÿ ๐‘๐š๐ญ๐ข๐จ๐ง๐š๐ฅ๐ข๐ญ๐ฒโ€

Agent-based models of social order work better when agents act by predictive pattern completion from prefix (culture/context) to suffix (action) than when they act through expected value maximization

4 weeks ago 34 11 4 1

belated happy birthday, Marc!

2 months ago 1 0 0 0
Post image

Hello all! ๐Ÿ‘‹

Iโ€™m delighted to share a ๐Ÿšจ new preprint ๐Ÿšจ:

โ€œActive Evaluation of General Agents: Problem Definition and Comparison of Baseline Algorithmsโ€.

A paper thread! ๐Ÿคฉ๐Ÿ“„๐Ÿงต 1/N

3 months ago 47 11 2 2

Merry Christmas! โ˜ƒ๏ธ๐ŸŒฒ

3 months ago 3 0 1 0

Maybe the general intelligence has always been behind the algorithm or the prompt? No publicly available eval seems to be safe from researchers overfitting.

4 months ago 0 0 0 0

It hasn't disappointed thus far!

6 months ago 0 0 0 0

@sharky6000.bsky.social this may be of interest!

8 months ago 4 0 1 0

I was following this one during the COVID pandemic, but it has been inactive for quite some time. The original talks' recordings are amazing, though!

10 months ago 1 0 1 0

Yeah, it's been a period for all of us simultaneously! I have also been pretty busy with thesis/job search. Hopefully, it will be back running in the Fall term!

10 months ago 1 0 0 0

@aamasconf.bsky.social 2025 was very special for us! We had the opportunity. to present a tutorial on general evaluation of AI agents, and we got a best paper award! Congrats, @sharky6000.bsky.social and the team! ๐ŸŽ‰

10 months ago 13 1 0 0
Advertisement
Preview
A Tutorial on General Evaluation of AI Agents Artificial Intelligence (AI) and machine learning (ML), in particular, have emerged as scientific disciplines concerned with understanding and building single and multi-agent systems with the ability ...

In the afternoon we will be giving a tutorial on general evaluation of AI agents.

sites.google.com/view/aamas20... 10/N

11 months ago 4 1 1 0
Preview
Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt Artificial Intelligence (AI) systems are increasingly placed in positions where their decisions have real consequences, e.g., moderating online spaces, conducting research, and advising on policy. Ens...

Announcing our latest arxiv paper:

Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt
arxiv.org/abs/2505.05197

We argue for a view of AI safety centered on preventing disagreement from spiraling into conflict.

11 months ago 24 6 1 1

Congrats, Seth!

11 months ago 1 0 1 0
Preview
Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt โ€” LessWrong We can just drop the axiom of rational convergence.

First LessWrong post! Inspired by Richard Rorty, we argue for a different view of AI alignment, where the goal is "more like sewing together a very large, elaborate, polychrome quilt", than it is "like getting a clearer vision of something true and deep"
www.lesswrong.com/posts/S8KYwt...

11 months ago 6 1 3 0

The quality of London's museums is just amazing! Enjoy!

1 year ago 3 0 0 0
A Theory of Appropriateness with Applications to Generative Artificial Intelligence
A Theory of Appropriateness with Applications to Generative Artificial Intelligence YouTube video by MITCBMM

In case folks are interested, here's a video of a talk I gave at MIT a couple weeks ago: youtu.be/FmN6fRyfcsY?...

1 year ago 8 3 0 0

Our new evaluation method, Soft Condorcet Optimization is now available open-source! ๐Ÿ‘

Both the sigmoid (smooth Kendall-tau) and Fenchel-Young (perturbed optimizers) versions.

Also, an optimized C++ implementation that is ~40X faster than the Python one. ๐Ÿคฉโšก

github.com/google-deepm...

1 year ago 16 3 0 1
Advertisement
SCaLA-25 A workshop connecting research topics in social choice and learning algorithms.

Working at the intersection of social choice and learning algorithms?

Check out the 2nd Workshop on Social Choice and Learning Algorithms (SCaLA) at @ijcai.bsky.social this summer.

Submission deadline: May 9th.

I attended last year at AAMAS and loved it! ๐Ÿ‘

sites.google.com/corp/view/sc...

1 year ago 19 6 0 2

If the AAMAS website is a good reference for this, it may not be, but uncertain atm.

1 year ago 1 0 1 0

Come to understand ML evaluation from first principles! We have put together a great AAMAS tutorial covering statistics, probabilistic models, game theory, and social choice theory.

Bonus: a unifying perspective of the problem leveraging decision-theoretic principles!

Join us on May 19th!

1 year ago 6 1 1 0

Re #2: The key finding there is that the stationary points of SCO contain the margin matrix but, as I said in the note, there is still more work to do!

1 year ago 1 0 1 0

Thanks! I have been meaning to update the manuscript to standalone without the main paper but instead I may have change the content to a different format ๐Ÿ˜‰. Coming soon!

1 year ago 1 0 2 0

Ah, I see the confusion... I never used the "identically distributed assumption," only the independence assumption (from 8 to 9).

1 year ago 1 0 0 0

I'm not sure if I understood your question correctly, but yes? As the post you shared says, "Voila! We have shown that minimizing the KL divergence amounts to finding the maximum likelihood estimate of ฮธ." Maybe I am missing your point ๐Ÿ˜ฌ

1 year ago 0 0 2 0
Advertisement

Elo drives most LLM evaluations, but we often overlook its assumptions, benefits, and limitations. While working on SCO, we wanted to understand the SCO-Elo distinction, so I looked and uncovered some intriguing findings and documented them in these notes. I hope you find them valuable!

1 year ago 2 1 0 0

Looking for a principled evaluation method for ranking of *general* agents or models, i.e. that get evaluated across a myriad of different tasks?

Iโ€™m delighted to tell you about our new paper, Soft Condorcet Optimization (SCO) for Ranking of General Agents, to be presented at AAMAS 2025! ๐Ÿงต 1/N

1 year ago 66 17 1 6

I had the convexity results for the online pairwise update (Section B.1.1.1) in my notes (manfreddiaz.github.io/assets/pdf/s...), but it is not clear to me if they hold for the other non-online settings. Worth taking a more detailed pass over the paper!

1 year ago 2 0 0 0

That's a nice finding, @sacha2.bsky.social! @sharky6000.bsky.social I skimmed over it, and it seems neat! There is an important distinction, though. They work with the "online" Elo regime, departing from the traditional gradient/batch gradient descent updates. (e.g., FIDE doesn't use online updates)

1 year ago 2 0 1 0

lol ๐Ÿ˜€

1 year ago 3 0 0 0
Michael I. Jordan - Wikipedia

Not that Michael Jordan, but this one en.wikipedia.org/wiki/Michael...

1 year ago 3 0 1 0