Whoa—my book is up for pre-order!
𝐌𝐨𝐝𝐞𝐥 𝐭𝐨 𝐌𝐞𝐚𝐧𝐢𝐧𝐠: 𝐇𝐨𝐰 𝐭𝐨 𝐈𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭 𝐒𝐭𝐚𝐭 & 𝐌𝐋 𝐌𝐨𝐝𝐞𝐥𝐬 𝐢𝐧 #Rstats 𝐚𝐧𝐝 #PyData
The book presents an ultra-simple and powerful workflow to make sense of ± any model you fit
The web version will stay free forever and my proceeds go to charity.
tinyurl.com/4fk56fc8
Posts by Nathaniel B
You have not seen second generation p-values used anywhere because they were badly reinvented equivalence tests, and people should use equivalence tests instead. See open.lnu.se/index.php/me...
"New study finds salary benchmarking cuts pay gaps by 25%. Pay dispersion partly arises from firms’ uncertainty about market rates, with key implications for pay transparency policy."
New paper from Perez-Truglia, Li & Cullen
www.restud.com/whats-my-emp...:
#econsky
#REStud
Posit and AWS have signed a Strategic Collaboration Agreement!
This collaboration helps customers modernize their data infrastructure & accelerate their data science journey on the cloud, making it easier for teams to build and run critical data science outcomes.
Read now posit.co/blog/posit-s...
I am beyond excited to announce that ggplot2 4.0.0 has just landed on CRAN.
It's not every day we have a new major #ggplot2 release but it is a fitting 18 year birthday present for the package.
Get an overview of the release in this blog post and be on the lookout for more in-depth posts #rstats
Models as Prediction Machines: How to Convert Confusing Coefficients into Clear Quantities Abstract Psychological researchers usually make sense of regression models by interpreting coefficient estimates directly. This works well enough for simple linear models, but is more challenging for more complex models with, for example, categorical variables, interactions, non-linearities, and hierarchical structures. Here, we introduce an alternative approach to making sense of statistical models. The central idea is to abstract away from the mechanics of estimation, and to treat models as “counterfactual prediction machines,” which are subsequently queried to estimate quantities and conduct tests that matter substantively. This workflow is model-agnostic; it can be applied in a consistent fashion to draw causal or descriptive inference from a wide range of models. We illustrate how to implement this workflow with the marginaleffects package, which supports over 100 different classes of models in R and Python, and present two worked examples. These examples show how the workflow can be applied across designs (e.g., observational study, randomized experiment) to answer different research questions (e.g., associations, causal effects, effect heterogeneity) while facing various challenges (e.g., controlling for confounders in a flexible manner, modelling ordinal outcomes, and interpreting non-linear models).
Figure illustrating model predictions. On the X-axis the predictor, annual gross income in Euro. On the Y-axis the outcome, predicted life satisfaction. A solid line marks the curve of predictions on which individual data points are marked as model-implied outcomes at incomes of interest. Comparing two such predictions gives us a comparison. We can also fit a tangent to the line of predictions, which illustrates the slope at any given point of the curve.
A figure illustrating various ways to include age as a predictor in a model. On the x-axis age (predictor), on the y-axis the outcome (model-implied importance of friends, including confidence intervals). Illustrated are 1. age as a categorical predictor, resultings in the predictions bouncing around a lot with wide confidence intervals 2. age as a linear predictor, which forces a straight line through the data points that has a very tight confidence band and 3. age splines, which lies somewhere in between as it smoothly follows the data but has more uncertainty than the straight line.
Ever stared at a table of regression coefficients & wondered what you're doing with your life?
Very excited to share this gentle introduction to another way of making sense of statistical models (w @vincentab.bsky.social)
Preprint: doi.org/10.31234/osf...
Website: j-rohrer.github.io/marginal-psy...
We can innovate with new setups though instead of having it be academic / rigorous (with heavy caveats for that adjective) vs data bloggers. Having a reproducibility norm would be critical to such a new setup.
If interested please test the notebooks.
Examples:
Describe
github.com/gabors-data-...
OLS
github.com/gabors-data-...
Random forest, SHAP
github.com/gabors-data-...
Panel FE, FD
github.com/gabors-data-...
I will insist on maximizing interaction terms on both platforms
Hey I'm no consultant
Very pleased to announce our special issue on Gender Norms, edited by @emmatominey.bsky.social and @nikkishure.bsky.social.
onlinelibrary.wiley.com/doi/toc/10.1...
this was excellent. a highly recommended way to spend 5 hours.
St Louis and Baltimore show up as the same point because they both went exactly from 74 to 56 in raw counts, which yields the same raw % decline and model predicted underlying % decline.
Scatter plot of 31 U.S. cities (excluding “Total”) showing raw vs. Bayesian MAP–regularized percent change in homicides from 2024 to 2025. Raw changes are on the x-axis, regularized changes on the y-axis. Each point is labeled by city, with small arrows connecting any labels placed more than 10 pixels from their marker. A caption below summarizes the log‐ratio transformation, hierarchical Normal(μ,τ²) model with an Inverse‐Gamma penalty on τ², and the posterior‐mean back-transform to percent change.
Screenshot of Jeff Asher’s Substack titled “Murder is down a lot in big cities.” Below the title is a subtitle: “2025 vs 2024 YTD murders in top 30 most murderous cities.” A table lists four columns—City, YTD 2024, YTD 2025, and % Change—and shows, in descending order of decline, that Denver fell from 29 to 13 homicides (–55.2 %), Birmingham from 61 to 33 (–45.9 %), Cleveland 54→30 (–44.4 %), San Antonio 50→30 (–40.0 %), Las Vegas 54→35 (–35.2 %), Atlanta 50→33 (–34.0 %), Dallas 90→60 (–33.3 %), Indianapolis 70→49 (–30.0 %), and so on.
I indulged in a bit of fun doing a first pass of a regularized model for homicide declines with ChatGPT, using @jeffasher.bsky.social's data from jasher.substack.com/p/why-i-thin... to estimate "underlying" rates of homicide decline by city.
Thoughtfully planning your data collection efforts has a huge return on investment.
As someone who has cleaned data from projects both with and without data management planning, I can safely say it is worth the effort.
40 hours of planning > 80+ hours of cleaning messy data
A screenshot of a Bayesian statistical model training interface showing NUTS sampler progress. The display includes information about a model with 45 subgroup columns, training on 52,380 of 68,850 rows. The sampling is being performed using PyMC NUTS (JAX-compiled) with 4 chains running in parallel. The table shows the current progress of each chain with metrics including number of draws (ranging from 601-644), step size (0.01), gradient evaluations, sampling speed (approximately 1.14-1.22 draws/s), elapsed time (8:49), and remaining time estimates.
There's just something special about running a proper Bayesian MCMC script that I can't replace with anything else after doing it for almost 10 years
“First, about half the time I reanalyze a study, I find that there are important bugs in the code, or that adding more data makes the mathematical finding go away, or that there’s a compelling alternative explanation for the results.”
davidroodman.com/blog/2025/05...
7/ A series of errors create an illusion of statistically significant treatment effects of similar size in the five strategies. Examples include changing/adding/removing variables in regressions, using incorrect bandwidths, and changing clustering methods.
1/ The paper claimed that the reform caused a jump in rape by 50–60 percent. This seemed hard to reconcile with the flat time trend in reported rapes around the time of the reform.
I need a newsletter because I have so many words inside of me about the "they say they want to raise the birth rate and yet they are decimating maternal and infant health surveillance infrastructure [or whatever else], what a gotcha" thing
Jeet is wrong about fighting inflation hurting growth tho
You still only have N=1 per unit of the running variable when its cohort, once you consider the hierarchy. Again refer to the survey example: even if you have a measure with 10,000 respondents per unit time, that doesn't negate the issue that you only have one unit of time per unit time.
It's similar to using really large surveys for a pre-post comparison. Yes you're increasing the N literally in a tabular sense, but you still effectively have a few data points across time that you're measuring with more and more precision as you jack up N.
I get this last point, but the distinctions you're emphasizing seem irrelevant, with the logic from the AR piece applying just as much for cohort. For example, what looks like increased N isn't that simple here, because of correlation between people of the same cohort.
It's an RDD in cohort, which, when only using one cutoff (not eg one repeated year after year), has the same issues from the Annual Reviews piece you link:
- requires observations far from the threshold
- requires considering the time-series nature of the DGP
- the McCrary test becomes irrelevant
@icpsr.bsky.social @umich.edu is hosting DataLumos, which is a crowdsource repository of government data archive.icpsr.umich.edu/datalumos/home
Seeing the "shingles vaccine protects from dementia" claim going around again.
Unfortunately the main study it's based on is just not convincing. @epiellie.bsky.social has written a great post about why:
hey hey my constant use of the palmerpenguins data is now relevant for my public policy classes