This essay provides an overview of statistical methods in public policy, focused primarily on the United States. I trace the historical development of quantitative approaches in policy research, from early ad hoc applications through the 19th and early 20th centuries, to the full institutionalization of statistical analysis in federal, state, local, and nonprofit agencies by the late 20th century. I then outline three core methodological approaches to policy-centered statistical research across social science disciplines: description, explanation, and prediction, framing each in terms of the focus of the analysis. In descriptive work, researchers explore what exists and examine any variable of interest to understand their different distributions and relationships. In explanatory work, researchers ask why does it exist and how can it be influenced. The focus of the analysis is on explanatory variables (X) to either (1) accurately estimate their relationship with an outcome variable (Y), or (2) causally attribute the effect of specific explanatory variables on outcomes. In predictive work, researchers as what will happen next and focus on the outcome variable (Y) and on generating accurate forecasts, classifications, and predictions from new data. For each approach, I examine key techniques, their applications in policy contexts, and important methodological considerations. I then consider critical perspectives on quantitative policy analysis framed around issues related to a three-part โdata imperativeโ where governments are driven to count, gather, and learn from data. Each of these imperatives entail substantial issues related to privacy, accountability, democratic participation, and epistemic inequalitiesโissues at odds with public sector values of transparency and openness. I conclude by identifying some emerging trends in public sector-focused data science, inclusive ethical guidelines, open research practices, and future directions for the field.
Description Explanation Prediction
General question What exists? Why does it exist? How can it be influenced? What will happen next?
Focus of analysis Focus is on any variableโunderstanding different variables and their distributions and relationships Focus is on X โunderstanding the relationship between X and Y, often with an emphasis on causality Focus is on Y โforecasting or estimating the value of Y based on X, often without concern for causal mechanisms
Names for variable of interest โ Explanatory variable
Independent variable
Predictor variable
Covariate Outcome variable
Dependent variable
Response variable
Goal of analysis Summarize and explore data to identify patterns, trends, and relationships Estimation: Test hypotheses or theories and make inferences about the relationship between one or more X variables and Y
Causal attribution: A special form of estimatingโmake inferences about the causal relationship between a single X of interest and Y through credible causal assumptions and identification strategies Generate accurate predictions; maximize the amount of explainable variation in Y while minimizing prediction error
Evaluation criteria โ Confidence/credible intervals, coefficient significance, effect sizes, and theoretical consistency Metrics like root mean square error (RMSE) and R^2; out-of-sample performance
Typical approaches Univariate summary statistics like the mean, median, variance, and standard deviation; multivariate summary statistics like correlations and cross-tabulations t-tests, proportion tests, multivariate regression models; for causal attribution, careful identification through experiments, quasi-experiments, and other methods with observational data Multivariate regression models; more complex black-box approaches like machine learning and ensemble models
Table of contents
Introduction
Brief history of statistics in public policy
Core methodological approaches
Description
Explanation
Prediction
The pitfalls of counting, gathering, and learning from public data
Future directions
References
New preprint! A general overview of stats in public policy research with this (oversimplified but still helpful) separation of methods into description, explanation, and prediction #policysky
HTML/PDF: stats.andrewheiss.com/snoopy-spring/
SocArXiv: doi.org/10.31235/osf...