Petal Mokryn (@petalmokryn) Bsky

“Statistical and inductive inference by minimum message length”, by Chris Wallace, is honestly one of my favorite books.

“An introduction to universal artificial intelligence”, by Marcus Hutter, also goes very in depth into Solomonoff induction & why it’s so powerful.

1 month ago 0 0 0 0

I’m personally more familiar with the Bayesian approaches to this, less so the MDL & NML stuff

However, whichever one you use, both rely on the idea that inference and codes are deeply related, and that codes can be used as a powerful approach to inference.

1 month ago 2 0 1 0

This stuff is heavily related to Minimum Description Length principles, which in turn relate to Kolmogorov complexity.

The Bayesian counterparts are Minimum Message Length & Solomonoff induction.

There’s indeed a ton of really fascinating stuff there.

1 month ago 2 0 1 0

Thus far I couldn’t find anything, but yeah I’m sure someone must’ve at least thought about it before.

1 month ago 0 0 0 0

So I think we should just use the posterior joint log probability of observed data as a summary statistic to test in posterior predictive checks.

Or rather, use “H + (log(P)/N)”, where P is the posterior probability (or density) of the observed data. (8/8)

1 month ago 0 0 0 0

Asymptotic Equipartition has been shown to hold for so many different classes of stochastic processes, I suspect it might be fundamental to the very idea of stochastic processes. At the very least, anything that violates AEP must be very pathological. (7/?)

1 month ago 0 0 1 0

But there’s not a unified formal way to do this, it’s done on a more case by case basis. And I’m just here thinking…. Why not use the posterior joint log probability of the observed data (6/?)

1 month ago 0 0 2 0

, then the observed data should be similar to the pseudodata. A lot of people just plot them together and look if the observed data lies within the pseudodata. Sometimes a summary statistic is tested, such as the mean of the observed data VS the mean of pseudodata. (5/?)

1 month ago 1 0 1 0

Sorry, it should be near -NH.

So, a lot of practitioners do a posterior predictive check simply by generating a lot of pseudodata from the posterior, and comparing it to the observed data that generated the posterior. If the posterior is accurate to the true data generating process, (4/?)

1 month ago 1 0 1 0

Asymptotically so, of course. For finite sample size you get Frequentist bounds on the joint log probability of the sampled sequence - it should be near N*H, where H is the entropy rate of the process.

Now, how does this relate to posterior predictive checks? (3/?)

1 month ago 0 0 1 0

This holds for some very wide classes of stochastic processes, which have an “Asymptotic Equipartition Property” - there’s a Typical set of sequences with nearly equal probabilities, that are each very small, but as N increases, you’re sampling from that Typical set with probability 1. (2/?)

1 month ago 1 0 1 0

If you do N IID Bernoulli experiments with p=0.9, a sequence of N 1’s is the highest probability sequence, but it’s not a *typical* sample of the N Bernoullis - you don’t actually expect to draw N ones.

Turns out, there’s a set of sequences you *do* expect to draw from - the Typical set. (1/?)

1 month ago 1 0 1 0

One small thing I might eventually seek to push for is using typicality as a more unified summary statistic for posterior predictive checks, since all such checks essentially do a Frequentist hypothesis test on whether the observed data is in the typical set of the posterior model

Just a pet peeve

1 month ago 0 1 1 0

I specifically prefer the Bayesian approach due to how explicit all the assumptions are, and how they’re all essentially in one place (in the model itself) which I hope should make it easier to catch bad modeling/cheating.

And there are a ton of diagnostics to check posteriors, from MCMC or VI

1 month ago 3 1 1 0

What good reasons are there to forgo Bayesian inference?

I’ve yet to see circumstances in which Bayesian inference is wholly inappropriate, other than it perhaps being tricky/costly to fit or if the choice of priors is bad

I’m always happy to learn of new perspectives, if you’re willing to share!

1 month ago 2 0 2 0

The disappointment I felt when I realized you didn’t mean E. T. Jaynes…

1 month ago 3 0 1 0

(6/6) The Frequentist approach, I’d teach students only *after* they learn Bayesian stats, and intuitively understand stats as a form of applied epistemology.

Just my personal opinion, I think that’d really help students develop stats intuition instead of seeing it as a list of recipes for data.

1 month ago 4 0 1 0

(5/6) After that 1 introductory lesson, I’d teach students a full course on Bayesian statistics, which is far more approachable intuitively compared to the Frequentist paradigm.

1 month ago 2 0 1 0

(4/?) I’d finish the intro by talking on how you can compose Frequentist asymptotic analysis on top of estimators derived from Bayesian inference - a parameter estimator derived from a Bayesian method (e.g. MAP, posterior mode, MML) can be treated like any other estimator in the Frequentist paradigm

1 month ago 0 0 1 0

(3/?) - as possible about the data generating process, used together with asymptotic theory (what’d happen if you were to repeat the experiment N->inf times?) to derive properties of estimators.

I’d first explain what estimators are at the start of the Frequentist explanation, ofc.

1 month ago 0 0 1 0

(2/?) First, I’d give a brief overview of what statistics aims to do (forming beliefs about the world), and roughly how Bayesians & Frequentists both go about it.

Bayesians with fully specified prior distributions over all considered possible models, and Frequentists with as few assumptions -

1 month ago 2 0 1 0

(1/?) I’ve never (formally) taught a course before, so this is just my subjective pedagogy opinion

But if I were to teach statistics however I want to, I’d probably do roughly as follows:

1 month ago 0 0 1 0

In inf dim it’s more about the limitations of the math we have thus far. People are working very hard to push things further in functional analysis.

The Gaussian stuff *is* meaningful in certain contexts, e.g. in particle physics it’s kinetic energy

In other contexts it’s just easier to math

2 months ago 1 0 0 0

Something I personally think is cool is how the Gaussian integral is really the *only* integral we can analytically solve in arbitrarily high & infinite dimensions, and how so so so many problems across much of modern science are just about trying to extend stuff from the Gaussian case

2 months ago 2 0 1 0

Maybe that penalized MLE estimators can be interpreted from a Bayesian perspective?

There’s a whole zoo of Bayesian point estimators e.g. maximum a-posteriori, posterior mean, minimum message length, etc. And in the end, most of them look like some form of penalized maximum likelihood estimator.

3 months ago 1 0 0 0

Client Challenge

I think these papers may be relevant here

www.nature.com/articles/s41...

www.pnas.org/doi/pdf/10.1...

Tl;dr, the idea is that a wide class of functions exhibit “simplicity bias” when the inputs are selected randomly from a uniform distribution, and that this effect biases biology to simplicity.

3 months ago 5 1 1 0

- on an emotional level. Concrete predictions, put into brief but compelling story form.

Would that be any good in your opinion, or nah? 😅

7 months ago 1 0 0 0

If the goal is to get the message across to people not already informed on the matter, maybe specific forecasts on likely ways society will collapse if the issues aren’t resolved?

A few different possibilities (gotta represent forecasting uncertainty ofc), each a story hammering the point home -

7 months ago 2 0 2 0

There are still limitations of course. Especially in mathematical tractability - IFT is a statistical field theory, and things can get complicated fast if the spatial/spatiotemporal data you’re trying to infer has particularly complicated dynamics/statistics.

7 months ago 1 0 0 0

I also think there’s a lot of room both in making new & exciting variations on the method, and in applying it in various applications.

Oh it’s also very scalable. That’s a major bonus. Can’t forget about the scalability.

7 months ago 1 0 1 0

Posts by Petal Mokryn