Advertisement · 728 × 90

Posts by Andrés Corrada

NTQR 0.7.5 documentation

You can explore this logic and carry out your own computations by downloading my Open Source Python package - NTQR (pip install ntqr). ntqr.readthedocs.io/en/latest

23 hours ago 1 0 0 0
Post image Post image

There is an algebra and geometry to this logic that is straightforward. The algebra comes from the three sets of axioms that are universally true for any classification test. We can describe and count these axioms because we stripped the semantics from the test. Logical consistency, not soundness.

23 hours ago 0 0 1 0
Slide from NAML 2026 talk on using logical consistency to build no-knowledge alarms for misaligned classifiers.

Slide from NAML 2026 talk on using logical consistency to build no-knowledge alarms for misaligned classifiers.

in particular, we can build a logic, devoid of all probability theory, for the unsupervised evaluation of classifiers. Its fundamental question is simple - what group evaluations are logically consistent with the counts of how classifiers agree and disagree?

23 hours ago 0 0 1 0

Probability has also been abused in problems of unsupervised evaluation of classifiers. Starting with Dawid & Skeene and continuing with Parisi, all approaches have assumed that such evaluation can only be done via probabilistic assumptions. Not so.

23 hours ago 0 0 1 0

The contrast between the tombstones amplifying visually the unlikelihood of the renunciation.

1 day ago 1 0 0 0

As someone that works on the logic of unsupervised evaluation for classifiers, I'm intrigued. The no-knowledge alarms on can build for ensembles of classifiers to alert when at least one of them is malfunctioning from such logic without ground truth would be useful to any mind, human or robotic.

1 day ago 0 0 1 0
Preview
Call for Proposals — Compute! Paris 2026 Submit your talk proposal for Compute! Paris 2026. The Call for Proposals is open from April 15th to May 24th, 2026.

The team of JupyterCon 2023, PyData Paris 2024 & 2025 organizes a new conference named Compute! Paris 2026 on open source computation and data. The event will take place on November 25–26, 2026 at Sorbonne Université in Paris.

CfP deadline: May 24, 2026: compute.events/paris2026/cf...

2 days ago 11 11 1 0
Advertisement

The interesting part is counting how many of the axioms are independent. This tells us how much we can learn from having just the counts of their agreement/disagreements.

1 day ago 0 0 0 0
Post image

You cannot have a logic without axioms. So what would be the axioms for a logic of unsupervised evaluation for classifiers? If we focus on counts of how experts agree/disagree in their decisions, we can count the axioms from elementary considerations. ntqr.readthedocs.io/en/latest/no...

1 day ago 0 0 1 0

The algebra part is relatively easy. This is a step release to version v0.8 that will rework the hard part - solving for the possible and consistent sets of group evaluations. The integer solutions to the linear Diophantine system defined by the axiom groups - simplex, marginalization, observable

2 days ago 0 0 0 0
Jupyter notebook demonstrating the algebra and geometry of the axioms of unsupervised evaluation.

Jupyter notebook demonstrating the algebra and geometry of the axioms of unsupervised evaluation.

How many of the evaluation axioms for classifiers are independent?

How many of the evaluation axioms for classifiers are independent?

I've released version 0.7.5 of the Python Open Source package NTQR. This begins the reworking of the code with a simpler formulation based on the straightforward "origin" of the axioms for the logic of unsupervised evaluation for classifiers. ntqr.readthedocs.io/en/latest/no...

2 days ago 0 0 1 0
Post image Post image

We can use logical consistency between the graders to create a way to test if the LLM-as-Judges are failing our safety standards. arxiv.org/abs/2510.00821 This allows no-knowledge alarms that use proof-by-contradiction to alert when members are malfunctioning. ntqr.readthedocs.io/en/latest/no...

1 month ago 1 1 0 0
Post image

You can play with this logic today by using my NTQR Python package. ntqr.readthedocs.io/en/latest

1 month ago 0 0 0 0
Post image

correct "a" answers there are in it, how many correct "b" etc. This illustrates the utility of having a counting logic. We are guaranteed to have a complete representation of all possible answer keys in any domain. This makes the algorithms of the counting algebra universal - a logic.

1 month ago 0 0 1 0
Post image

Illustrated here is a Q=25 test for two LLMs-as-Judges doing pair-comparisons of two worker LLMs. The judges are three-way classifiers. Where is the correct answer key for these comparisons? It is somewhere in the set of all possible answer keys. But wherever it is, it has a summary of how many ...

1 month ago 0 0 1 0
Post image

This is one of my slides for next week's presentation at NAML 2026 on using logical consistency to create a logic of unsupervised evaluation for classifiers. It all hinges on stripping any semantics from tests and using agreement and disagreement counts alone. But the answer keys are also stripped.

1 month ago 0 0 1 0
Post image Post image

We can use logical consistency between the graders to create a way to test if the LLM-as-Judges are failing our safety standards. arxiv.org/abs/2510.00821 This allows no-knowledge alarms that use proof-by-contradiction to alert when members are malfunctioning. ntqr.readthedocs.io/en/latest/no...

1 month ago 1 1 0 0
Advertisement

This is truly a "no-knowledge" situation. And yet we can immediately exclude one possible evaluation for these two experts - they cannot BOTH be 100% correct.
The disagreement bit and logic has excluded a possible evaluation for them. And we still know nothing about this test.

2 months ago 0 0 0 0

The simplest example I can give of how the logic of unsupervised evaluation works is using two experts that we observe have disagreed in their answers to a test. We know nothing about the domain of the test, the actual answers, or their meaning. We just have one bit of information that they disagree

2 months ago 0 0 1 0
Preview
The Cartoon Guide to Löb's Theorem — LessWrong Lo!  A cartoon proof of Löb's Theorem! • Löb's Theorem shows that a mathematical system cannot assert its own soundness without becoming inconsistent…

We can interpret Löb's theorem from mathematical logic as saying that a monitor M of system S can never prove that S is safe -- a cartoon guide to it by @yudkowsky.bsky.social
My work on using logic for safer AI systems skirts this block using contradiction. www.lesswrong.com/posts/ALCnqX...

2 months ago 1 0 1 0
Post image

You can try these logical alarms today. ntqr.readthedocs.io/en/latest/no...

2 months ago 0 0 0 0
Post image

All tests of size Q have exactly the same simplex space representation no matter their domain. It is a semantic free counting space that is guaranteed to contain the point where the true answer key would map (green point in this illustration).
This completeness is the super power of this logic.

2 months ago 0 0 1 0
Post image

There are no axioms for computer programs that would allow you to solve the Halting Problem universally. This is not so for evaluations of classifiers. There we are discussing evaluation, not world models. And it is trivial to trap all possible answer keys to any test by their statistical summary.

2 months ago 0 0 1 0
Logical consistency is inescapable and universal. This can form the basis of a logic of unsupervised evaluation for any group of experts. By stripping semantics, we create a universally applicable logic.

Logical consistency is inescapable and universal. This can form the basis of a logic of unsupervised evaluation for any group of experts. By stripping semantics, we create a universally applicable logic.

Formal verification of the correctness of AI models is a hard problem. In contrast, formal verification of evaluations of AI models is trivial! Why?
Because we have no theory of all the correct theories we can have about the World. Similarly, we have no universal meta-theory of valid programs.

2 months ago 2 0 1 0

2. Ensemble counts must marginalize correctly. These two allow you to define the set of possible evaluations for N classifiers that processed Q items.
The last set, are the equations that tell you that every decision event count must be equal to a sum of that event GIVEN true label.

2 months ago 0 0 0 0
Advertisement
Post image

The preprint is on ArXiv. I should say the paper is correct but muddled (?). It needs to be updated to a much clearer way of stating how there is a logic of unsupervised evaluation of classifiers. There are two sets of linear equations that come from 1. complete list of discrete events.

2 months ago 0 0 1 0
Post image

By stripping a classification test or multiple choice exam to counts of label events, either known or unknown, we create a universally valid "trap" for the unknown true answer key to the test the classifiers have taken.
This is the super power of this counting logic of evaluation.

2 months ago 0 0 1 0
Post image

the much harder task of formally verifying a model's use of data to make a decision has the same problem Sherlock does. How do we know we are considering all the possibilities?
This is not the case for classification tests, counting models of them are complete!

2 months ago 0 0 1 0
Post image

This counting algebra enforces logical consistency between the evaluations we can give test takers jointly having seen how they agree and disagree. It is similar to Sherlock's logic but better!
This gets to the core of why this algebraic logic of evaluations for classifiers is easier than ...

2 months ago 0 0 1 0
Post image

ancient the question is. Juvenal asks "Who guards the guards?" This infinite regress can only be terminated by assumptions. In Human-in-the-Loop AI we are saying the user is axiomatically correct. The counting algebra we can develop when we observe how experts disagree on a test is another.

2 months ago 0 0 1 0