Advertisement · 728 × 90

Posts by ArXiv Paperboy (Stat.ME+Econ.EM)

This paper develops a framework for identification, estimation, and inference on the causal mechanisms driving endogenous social network formation. Identification is challenging because of unobserved confounders and reverse causality; inference is complicated by questions of equilibrium and sampling. We leverage repeated observations of a network over time and random variation in initial ties to address challenges to causal identification. Our design-based approach sidesteps questions of sampling and asymptotics by treating both the set of nodes (individuals) and potential outcomes as non-random. We apply our approach to data from a large professional services firm, where new hires are randomly assigned to project teams within offices. We estimate the causal effect on tie formation of indirect ties, network degree, and local network density. Indirect ties have a strong and significant positive effect on tie formation, while the effects of degree and density are smaller and less robust.

This paper develops a framework for identification, estimation, and inference on the causal mechanisms driving endogenous social network formation. Identification is challenging because of unobserved confounders and reverse causality; inference is complicated by questions of equilibrium and sampling. We leverage repeated observations of a network over time and random variation in initial ties to address challenges to causal identification. Our design-based approach sidesteps questions of sampling and asymptotics by treating both the set of nodes (individuals) and potential outcomes as non-random. We apply our approach to data from a large professional services firm, where new hires are randomly assigned to project teams within offices. We estimate the causal effect on tie formation of indirect ties, network degree, and local network density. Indirect ties have a strong and significant positive effect on tie formation, while the effects of degree and density are smaller and less robust.

arXiv📈🤖
Causal inference for social network formation
By Kasy, Linos, Mobasseri

3 minutes ago 0 0 0 0
Predictions from machine learning algorithms can vary across random seeds, inducing instability in downstream debiased machine learning estimators. We formalize random seed stability via a concentration condition and prove that subbagging guarantees stability for any bounded-outcome regression algorithm. We introduce a new cross-fitting procedure, adaptive cross-bagging, which simultaneously eliminates seed dependence from both nuisance estimation and sample splitting in debiased machine learning. Numerical experiments confirm that the method achieves the targeted level of stability whereas alternatives do not. Our method incurs a small computational penalty relative to standard practice whereas alternative methods incur large penalties.

Predictions from machine learning algorithms can vary across random seeds, inducing instability in downstream debiased machine learning estimators. We formalize random seed stability via a concentration condition and prove that subbagging guarantees stability for any bounded-outcome regression algorithm. We introduce a new cross-fitting procedure, adaptive cross-bagging, which simultaneously eliminates seed dependence from both nuisance estimation and sample splitting in debiased machine learning. Numerical experiments confirm that the method achieves the targeted level of stability whereas alternatives do not. Our method incurs a small computational penalty relative to standard practice whereas alternative methods incur large penalties.

arXiv📈🤖
Improving reproducibility by controlling random seed stability in machine learning based estimation via bagging
By Williams, Schuler

8 minutes ago 0 0 0 0
Subsample-based estimation is a standard tool for achieving robustness to outliers in econometric models. This paper shows that, in dynamic time series settings, such procedures are fundamentally invalid under contamination, even under oracle knowledge of contamination locations. The key issue is that contamination propagates through the model's residual filter and distorts the estimation criterion itself. As a result, removing contaminated observations does not, in general, restore the uncontaminated objective or ensure consistency. We characterise this failure as a structural incompatibility between pointwise subsampling and residual propagation. To address it, we propose a propagation-compatible transformation of index sets, formalised through a patch removal operator that removes the residual footprint of contamination. Under suitable conditions, the proposed operator leaves the estimator asymptotically unchanged under the uncontaminated model, while restoring consistency for the clean-data parameter under contamination. The results apply to a broad class of residual-based estimators and show that valid subsample-based estimation in dynamic models requires explicit control of residual propagation.

Subsample-based estimation is a standard tool for achieving robustness to outliers in econometric models. This paper shows that, in dynamic time series settings, such procedures are fundamentally invalid under contamination, even under oracle knowledge of contamination locations. The key issue is that contamination propagates through the model's residual filter and distorts the estimation criterion itself. As a result, removing contaminated observations does not, in general, restore the uncontaminated objective or ensure consistency. We characterise this failure as a structural incompatibility between pointwise subsampling and residual propagation. To address it, we propose a propagation-compatible transformation of index sets, formalised through a patch removal operator that removes the residual footprint of contamination. Under suitable conditions, the proposed operator leaves the estimator asymptotically unchanged under the uncontaminated model, while restoring consistency for the clean-data parameter under contamination. The results apply to a broad class of residual-based estimators and show that valid subsample-based estimation in dynamic models requires explicit control of residual propagation.

arXiv📈🤖
Subsample-based Estimation under Dynamic Contamination
By Yang, Sandberg

9 minutes ago 0 0 0 0
The Mapper algorithm from topological data analysis constructs a graph summarizing the shape of a high-dimensional dataset, and groups of data points identified within this graph are widely interpreted as evidence of distinct subtypes. However, the covariance structure of the data alone can make such groups appear differentiated, even when no subtypes are present. Existing validation approaches do not account for this effect and thus cannot distinguish covariance artifacts from genuine subtypes. We propose a Gaussian null model that generates reference data matching the sample covariance matrix. We pair it with a test statistic that measures mean-level differentiation between communities. In an idealized setting, we prove that covariance geometry alone causes Mapper communities to differ in their average feature profiles, and we show that a simpler label-permutation baseline cannot detect this effect. Simulations confirm well-controlled Type I error under Gaussian data. We apply the framework to four published Mapper analyses spanning breast cancer gene expression, Congressional voting, NBA player performance, and lower-grade glioma genomics. In every case, once outlier singleton communities are accounted for, the observed differentiation does not exceed what the null produces at the {\alpha} = 0.05 level. This result does not rule out subtypes in these datasets, but it does indicate that the observed structure is consistent with what covariance geometry alone can produce. Stronger evidence would be needed to support a subtype claim.

The Mapper algorithm from topological data analysis constructs a graph summarizing the shape of a high-dimensional dataset, and groups of data points identified within this graph are widely interpreted as evidence of distinct subtypes. However, the covariance structure of the data alone can make such groups appear differentiated, even when no subtypes are present. Existing validation approaches do not account for this effect and thus cannot distinguish covariance artifacts from genuine subtypes. We propose a Gaussian null model that generates reference data matching the sample covariance matrix. We pair it with a test statistic that measures mean-level differentiation between communities. In an idealized setting, we prove that covariance geometry alone causes Mapper communities to differ in their average feature profiles, and we show that a simpler label-permutation baseline cannot detect this effect. Simulations confirm well-controlled Type I error under Gaussian data. We apply the framework to four published Mapper analyses spanning breast cancer gene expression, Congressional voting, NBA player performance, and lower-grade glioma genomics. In every case, once outlier singleton communities are accounted for, the observed differentiation does not exceed what the null produces at the {\alpha} = 0.05 level. This result does not rule out subtypes in these datasets, but it does indicate that the observed structure is consistent with what covariance geometry alone can produce. Stronger evidence would be needed to support a subtype claim.

arXiv📈🤖
A Null Model for Mapper Subtype Claims
By Topaz

12 minutes ago 0 0 0 0
Breast cancer is the most prevalent cancer in women worldwide. Histopathology image analysis serves as the gold standard for cancer diagnosis. In this regard, whole-slide imaging (WSI), a revolutionary technology in digital pathology, allows for ultrahigh-resolution tissue analysis. Despite its promise, WSI analysis faces significant computational challenges due to its massive data size and tissue heterogeneity. To address this issue, we present a Gaussian mixture based multiple instance learning (MIL) framework for WSI analysis with partially subsampled instances. Our approach models a WSI as a bag of instances (i.e., randomly cropped sub-images), leveraging a bag-based maximum likelihood estimator (BMLE) to predict metastases. Furthermore, we introduce a subsampling-based maximum likelihood estimator (SMLE) to refine predictions by selectively labeling a subset of instances. Extensive evaluations of the breast carcinoma metastasis prediction demonstrate that BMLE surpasses state-of-the-art methods, while the SMLE further improves the prediction accuracy at both bag and instance levels. We find that our method is fairly robust against various plausible model mis-specifications. Theoretical analyses and simulation studies validate the performance and robustness of our methods.

Breast cancer is the most prevalent cancer in women worldwide. Histopathology image analysis serves as the gold standard for cancer diagnosis. In this regard, whole-slide imaging (WSI), a revolutionary technology in digital pathology, allows for ultrahigh-resolution tissue analysis. Despite its promise, WSI analysis faces significant computational challenges due to its massive data size and tissue heterogeneity. To address this issue, we present a Gaussian mixture based multiple instance learning (MIL) framework for WSI analysis with partially subsampled instances. Our approach models a WSI as a bag of instances (i.e., randomly cropped sub-images), leveraging a bag-based maximum likelihood estimator (BMLE) to predict metastases. Furthermore, we introduce a subsampling-based maximum likelihood estimator (SMLE) to refine predictions by selectively labeling a subset of instances. Extensive evaluations of the breast carcinoma metastasis prediction demonstrate that BMLE surpasses state-of-the-art methods, while the SMLE further improves the prediction accuracy at both bag and instance levels. We find that our method is fairly robust against various plausible model mis-specifications. Theoretical analyses and simulation studies validate the performance and robustness of our methods.

arXiv📈🤖
Detecting Breast Carcinoma Metastasis on Whole-Slide Images by Partially Subsampled Multiple Instance Learning
By Yu, Li, Zhou et al

15 minutes ago 0 0 0 0
Models with fewer parameters are often easier to interpret and more robust. Parsimony can be achieved through optimizing objectives like the AIC or BIC, which are functions of the the number of free parameters in the model. Optimizing this discrete objective is a challenge, often relying on discrete optimization. We construct smooth functions with optima that reach the same optima of these objectives but permit continuous rather than discrete optimization, relieving some selection burden. Proofs of convergence are provided and a novel method of clustering through explicit overparamterization shows promising results.

Models with fewer parameters are often easier to interpret and more robust. Parsimony can be achieved through optimizing objectives like the AIC or BIC, which are functions of the the number of free parameters in the model. Optimizing this discrete objective is a challenge, often relying on discrete optimization. We construct smooth functions with optima that reach the same optima of these objectives but permit continuous rather than discrete optimization, relieving some selection burden. Proofs of convergence are provided and a novel method of clustering through explicit overparamterization shows promising results.

arXiv📈🤖
Model Selection and Parameter Inference through Constraints via Sequences of Surrogate Smoothing Functions
By Shaikh

18 minutes ago 0 0 0 0
Computer simulations play an important role in scientific discovery and engineering innovation. Reliable computer models enable virtual experimentation that reduces the need for costly and time-consuming physical testing. However, the credibility of such models hinges on rigorous statistical validation against real-world data. This paper develops a formal frequentist framework for both global and subdomain validation of computer models. We propose the Fourier Maximum Modulus Test (FMMT), which leverages kernel ridge regression (KRR) to estimate the discrepancy between the computer model and the physical process, followed by a frequency-domain test based on weighted generalized Fourier coefficients. The theoretical analysis establishes the asymptotic normality of these coefficients, allowing for closed-form p-values. Simulation studies and a shear-layer experiment demonstrate that FMMT achieves high power, accurate Type I error control, and strong sensitivity to localized discrepancies.

Computer simulations play an important role in scientific discovery and engineering innovation. Reliable computer models enable virtual experimentation that reduces the need for costly and time-consuming physical testing. However, the credibility of such models hinges on rigorous statistical validation against real-world data. This paper develops a formal frequentist framework for both global and subdomain validation of computer models. We propose the Fourier Maximum Modulus Test (FMMT), which leverages kernel ridge regression (KRR) to estimate the discrepancy between the computer model and the physical process, followed by a frequency-domain test based on weighted generalized Fourier coefficients. The theoretical analysis establishes the asymptotic normality of these coefficients, allowing for closed-form p-values. Simulation studies and a shear-layer experiment demonstrate that FMMT achieves high power, accurate Type I error control, and strong sensitivity to localized discrepancies.

arXiv📈🤖
Statistical Validation of Computer Models: Global and Subdomain Hypothesis Testing
By Li, Zhang, Tuo

21 minutes ago 0 0 0 0
The present study aims to investigate a cluster cleaning algorithm that is both computationally simple and capable of solving the PU classification when the SCAR condition is unsatisfied. A secondary objective of this study is to determine the robustness of the LassoJoint method to perturbations of the SCAR condition. In the first step of our algorithm, we obtain cleaning labels from 2-means clustering. Subsequently, we perform logistic regression on the cleaned data, assigning positive labels from the cleaning algorithm with additional true positive observations. The remaining observations are assigned the negative label. The proposed algorithm is evaluated by comparing 11 real data sets from machine learning repositories and a synthetic set. The findings obtained from this study demonstrate the efficacy of the clustering algorithm in scenarios where the SCAR condition is violated and further underscore the moderate robustness of the LassoJoint algorithm in this context.

The present study aims to investigate a cluster cleaning algorithm that is both computationally simple and capable of solving the PU classification when the SCAR condition is unsatisfied. A secondary objective of this study is to determine the robustness of the LassoJoint method to perturbations of the SCAR condition. In the first step of our algorithm, we obtain cleaning labels from 2-means clustering. Subsequently, we perform logistic regression on the cleaned data, assigning positive labels from the cleaning algorithm with additional true positive observations. The remaining observations are assigned the negative label. The proposed algorithm is evaluated by comparing 11 real data sets from machine learning repositories and a synthetic set. The findings obtained from this study demonstrate the efficacy of the clustering algorithm in scenarios where the SCAR condition is violated and further underscore the moderate robustness of the LassoJoint algorithm in this context.

arXiv📈🤖
A proposal for PU classification under Non-SCAR using clustering and logistic model
By Furmanczyk, Paczutkowski

24 minutes ago 0 0 0 0
Empirical researchers often use diagnostic checks to assess the plausibility of their modeling assumptions, such as testing for covariate balance in RCTs, pre-trends in event studies, or instrument validity in IV designs. While these checks are traditionally treated as external hurdles to estimation, we argue they should be integrated into the estimation process itself. In particular, we propose residualizing one's baseline estimator against the vector of diagnostic check statistics to remove the component of baseline sampling variation explained by the diagnostic checks. This residualized estimator offers researchers a "free lunch," delivering three properties simultaneously: (i) eliminating inference distortions from check-based selective reporting; (ii) reducing variance without changing the estimand when the baseline model is correctly specified; and (iii) minimizing worst-case bias under bounded local misspecification within the class of linear adjustments. We apply our method to the RCT in Kaur et al. (2024) and find that, even in a setting where all balance checks pass comfortably, residualization increases the magnitude of the baseline point estimate and reduces its standard error, equivalent to approximately a 10% increase in sample size.

Empirical researchers often use diagnostic checks to assess the plausibility of their modeling assumptions, such as testing for covariate balance in RCTs, pre-trends in event studies, or instrument validity in IV designs. While these checks are traditionally treated as external hurdles to estimation, we argue they should be integrated into the estimation process itself. In particular, we propose residualizing one's baseline estimator against the vector of diagnostic check statistics to remove the component of baseline sampling variation explained by the diagnostic checks. This residualized estimator offers researchers a "free lunch," delivering three properties simultaneously: (i) eliminating inference distortions from check-based selective reporting; (ii) reducing variance without changing the estimand when the baseline model is correctly specified; and (iii) minimizing worst-case bias under bounded local misspecification within the class of linear adjustments. We apply our method to the RCT in Kaur et al. (2024) and find that, even in a setting where all balance checks pass comfortably, residualization increases the magnitude of the baseline point estimate and reduces its standard error, equivalent to approximately a 10% increase in sample size.

arXiv📈🤖
Integrating Diagnostic Checks into Estimation
By Sarfati, Vilfort

26 minutes ago 0 0 0 0
Advertisement
Online controlled experiments face growing challenges from overlapping tests on shared traffic, where interactions between concurrent experiments obscure insights into feature combinations and
  produce effect estimates that do not correspond to any actionable launch scenario. While traffic splitting, layering, and sequential execution (non-concurrent) mitigate some of these issues, they
  require coordination overhead and can reduce experimentation velocity. We propose Multi-Experiment Analysis (MEA), a methodology for consistent joint estimation in the presence of arbitrary
  partial or full overlaps and multiple variants. MEA produces three types of estimates: (1) corrected individual treatment effects that account for the presence of overlapping experiments, (2)
  combined effects of launching any desired combination of variants across experiments, and (3) conditional effects of an experiment's variant given that specific variants of other experiments are
  launched or deramped -- all without requiring factorial pre-design or traffic restrictions. We validate the approach through comprehensive simulations confirming consistency and correct coverage.
  We report on production deployment at scale, illustrate the methodology through real-world use cases, and share practical lessons learned -- including system design, adoption patterns, and
  insights from production use.

Online controlled experiments face growing challenges from overlapping tests on shared traffic, where interactions between concurrent experiments obscure insights into feature combinations and produce effect estimates that do not correspond to any actionable launch scenario. While traffic splitting, layering, and sequential execution (non-concurrent) mitigate some of these issues, they require coordination overhead and can reduce experimentation velocity. We propose Multi-Experiment Analysis (MEA), a methodology for consistent joint estimation in the presence of arbitrary partial or full overlaps and multiple variants. MEA produces three types of estimates: (1) corrected individual treatment effects that account for the presence of overlapping experiments, (2) combined effects of launching any desired combination of variants across experiments, and (3) conditional effects of an experiment's variant given that specific variants of other experiments are launched or deramped -- all without requiring factorial pre-design or traffic restrictions. We validate the approach through comprehensive simulations confirming consistency and correct coverage. We report on production deployment at scale, illustrate the methodology through real-world use cases, and share practical lessons learned -- including system design, adoption patterns, and insights from production use.

arXiv📈🤖
Multi-Experiment Analysis
By Hosseini

30 minutes ago 0 0 0 0
Multivariate Pearson diffusions, also known as polynomial diffusions, are characterized by a linear drift vector and a diffusion matrix that is quadratic in the state variables. We derive exact closed-form expressions for the mean and covariance matrix of this class by using results on matrix exponential integrals. We then extend this framework to a broader class of nonlinear diffusions with Pearson-type multiplicative noise. The main contribution of this paper is a new parameter estimator for these nonlinear models based on Strang splitting (SS). The proposed method decomposes the stochastic system into a deterministic nonlinear ordinary differential equation (ODE) and a multivariate Pearson diffusion. We construct the SS estimator by composing their respective flows and applying a Gaussian transition approximation parameterized by the exact moments of the Pearson component. We prove that the SS estimator is consistent and asymptotically efficient. Furthermore, we introduce a new model within this broader class, which we call the Student Kramers oscillator, and we prove existence and uniqueness of the strong solution as well as existence of an invariant measure. We evaluate the SS estimator through simulation studies on this new oscillator and on the multivariate Wright-Fisher diffusion from population genetics. These simulations demonstrate that the SS estimator outperforms the standard Euler-Maruyama estimator, the Gaussian approximation estimator, and the local linearization estimator. Finally, we apply the SS estimator to fit the Student Kramers oscillator to Greenland ice core data.

Multivariate Pearson diffusions, also known as polynomial diffusions, are characterized by a linear drift vector and a diffusion matrix that is quadratic in the state variables. We derive exact closed-form expressions for the mean and covariance matrix of this class by using results on matrix exponential integrals. We then extend this framework to a broader class of nonlinear diffusions with Pearson-type multiplicative noise. The main contribution of this paper is a new parameter estimator for these nonlinear models based on Strang splitting (SS). The proposed method decomposes the stochastic system into a deterministic nonlinear ordinary differential equation (ODE) and a multivariate Pearson diffusion. We construct the SS estimator by composing their respective flows and applying a Gaussian transition approximation parameterized by the exact moments of the Pearson component. We prove that the SS estimator is consistent and asymptotically efficient. Furthermore, we introduce a new model within this broader class, which we call the Student Kramers oscillator, and we prove existence and uniqueness of the strong solution as well as existence of an invariant measure. We evaluate the SS estimator through simulation studies on this new oscillator and on the multivariate Wright-Fisher diffusion from population genetics. These simulations demonstrate that the SS estimator outperforms the standard Euler-Maruyama estimator, the Gaussian approximation estimator, and the local linearization estimator. Finally, we apply the SS estimator to fit the Student Kramers oscillator to Greenland ice core data.

arXiv📈🤖
Strang splitting estimator for nonlinear multivariate stochastic differential equations with Pearson-type multiplicative noise
By Pilipovi\'c, Samson, Ditlevsen

35 minutes ago 0 0 0 0
External validation is widely regarded as the gold standard for prognostic model evaluation. In this study, we challenge the assumption that successful external calibration guarantees model generalizability and propose two complementary strategies to improve transportability of prognostic models across cohorts.
  Using six real-world surgical cohorts from tertiary academic centers, we tested whether successful external calibration depends largely on similarity in covariates and outcomes between training and validation cohorts, quantified using Kullback-Leibler (KL) divergence, with calibration assessed by the Integrated Calibration Index (ICI). From the model-developer's perspective, we trained the "best-on-average" prognostic model by tuning toward a meta-analysis-derived covariate and outcome distribution as an approximation of the broader target population. From the end-user perspective, we proposed a simple measure for cohort outcome similarity to identify, among published models, the one most suitable for a given target cohort in terms of both calibration and clinical utility.
  External calibration worsened as distributional mismatch increased. Higher KL divergence was associated with higher ICI in both surgery-alone (Spearman $\rho=0.614$, $p=0.004$) and surgery + adjuvant chemotherapy cohorts (Spearman $\rho=0.738$, $p<0.001$). Meta-analysis-informed weighting improved calibration in most settings without materially affecting discrimination, with the clearest benefit when evaluated on the aggregated external population ($p=0.037$). Models developed in more similar cohorts achieved lower ICI in surgery-alone (Spearman $\rho=0.803$, $p<0.001$) and surgery + adjuvant chemotherapy cohorts (Spearman $\rho=0.737$, $p<0.001$), and provided greater clinical utility on DCA.

External validation is widely regarded as the gold standard for prognostic model evaluation. In this study, we challenge the assumption that successful external calibration guarantees model generalizability and propose two complementary strategies to improve transportability of prognostic models across cohorts. Using six real-world surgical cohorts from tertiary academic centers, we tested whether successful external calibration depends largely on similarity in covariates and outcomes between training and validation cohorts, quantified using Kullback-Leibler (KL) divergence, with calibration assessed by the Integrated Calibration Index (ICI). From the model-developer's perspective, we trained the "best-on-average" prognostic model by tuning toward a meta-analysis-derived covariate and outcome distribution as an approximation of the broader target population. From the end-user perspective, we proposed a simple measure for cohort outcome similarity to identify, among published models, the one most suitable for a given target cohort in terms of both calibration and clinical utility. External calibration worsened as distributional mismatch increased. Higher KL divergence was associated with higher ICI in both surgery-alone (Spearman $\rho=0.614$, $p=0.004$) and surgery + adjuvant chemotherapy cohorts (Spearman $\rho=0.738$, $p<0.001$). Meta-analysis-informed weighting improved calibration in most settings without materially affecting discrimination, with the clearest benefit when evaluated on the aggregated external population ($p=0.037$). Models developed in more similar cohorts achieved lower ICI in surgery-alone (Spearman $\rho=0.803$, $p<0.001$) and surgery + adjuvant chemotherapy cohorts (Spearman $\rho=0.737$, $p<0.001$), and provided greater clinical utility on DCA.

arXiv📈🤖
Robustifying and Selecting Cohort-Appropriate Prognostic Models under Distributional Shifts
By Bertsimas, Gao, Koulouras et al

38 minutes ago 0 0 0 0
In this paper, we consider estimation of average treatment effect on the treated (ATT), an interpretable and relevant causal estimand to policy makers when treatment assignment is endogenous. By considering shadow variables that are unrelated to the treatment assignment but related to the outcomes of interest, we establish identification of the ATT. Then we focus on efficient estimation of the ATT by characterizing the geometric structure of the likelihood, deriving the semiparametric efficiency bound for ATT estimation and proposing an estimator that can achieve this bound. We rigorously establish the theoretical results of the proposed estimator. The finite sample performance of the proposed estimator is studied through comprehensive simulation studies as well as an application to our motivating study.

In this paper, we consider estimation of average treatment effect on the treated (ATT), an interpretable and relevant causal estimand to policy makers when treatment assignment is endogenous. By considering shadow variables that are unrelated to the treatment assignment but related to the outcomes of interest, we establish identification of the ATT. Then we focus on efficient estimation of the ATT by characterizing the geometric structure of the likelihood, deriving the semiparametric efficiency bound for ATT estimation and proposing an estimator that can achieve this bound. We rigorously establish the theoretical results of the proposed estimator. The finite sample performance of the proposed estimator is studied through comprehensive simulation studies as well as an application to our motivating study.

arXiv📈🤖
Efficient Estimation of Average Treatment Effect on the Treated under Endogenous Treatment Assignment
By Ghosh, Shan, Yu et al

11 hours ago 0 0 0 0
Selection bias arises when the probability that an observation enters a dataset depends on variables related to the quantities of interest, leading to systematic distortions in estimation and uncertainty quantification. For example, in epidemiological or survey settings, individuals with certain outcomes may be more likely to be included, resulting in biased prevalence estimates with potentially substantial downstream impact. Classical corrections, such as inverse-probability weighting or explicit likelihood-based models of the selection process, rely on tractable likelihoods, which limits their applicability in complex stochastic models with latent dynamics or high-dimensional structure. Simulation-based inference enables Bayesian analysis without tractable likelihoods but typically assumes missingness at random and thus fails when selection depends on unobserved outcomes or covariates. Here, we develop a bias-aware simulation-based inference framework that explicitly incorporates selection into neural posterior estimation. By embedding the selection mechanism directly into the generative simulator, the approach enables amortized Bayesian inference without requiring tractable likelihoods. This recasting of selection bias as part of the simulation process allows us to both obtain debiased estimates and explicitly test for the presence of bias. The framework integrates diagnostics to detect discrepancies between simulated and observed data and to assess posterior calibration. The method recovers well-calibrated posterior distributions across three statistical applications with diverse selection mechanisms, including settings in which likelihood-based approaches yield biased estimates. These results recast the correction of selection bias as a simulation problem and establish simulation-based inference as a practical and testable strategy for parameter estimation under selection bias.

Selection bias arises when the probability that an observation enters a dataset depends on variables related to the quantities of interest, leading to systematic distortions in estimation and uncertainty quantification. For example, in epidemiological or survey settings, individuals with certain outcomes may be more likely to be included, resulting in biased prevalence estimates with potentially substantial downstream impact. Classical corrections, such as inverse-probability weighting or explicit likelihood-based models of the selection process, rely on tractable likelihoods, which limits their applicability in complex stochastic models with latent dynamics or high-dimensional structure. Simulation-based inference enables Bayesian analysis without tractable likelihoods but typically assumes missingness at random and thus fails when selection depends on unobserved outcomes or covariates. Here, we develop a bias-aware simulation-based inference framework that explicitly incorporates selection into neural posterior estimation. By embedding the selection mechanism directly into the generative simulator, the approach enables amortized Bayesian inference without requiring tractable likelihoods. This recasting of selection bias as part of the simulation process allows us to both obtain debiased estimates and explicitly test for the presence of bias. The framework integrates diagnostics to detect discrepancies between simulated and observed data and to assess posterior calibration. The method recovers well-calibrated posterior distributions across three statistical applications with diverse selection mechanisms, including settings in which likelihood-based approaches yield biased estimates. These results recast the correction of selection bias as a simulation problem and establish simulation-based inference as a practical and testable strategy for parameter estimation under selection bias.

arXiv📈🤖
Overcoming Selection Bias in Statistical Studies With Amortized Bayesian Inference
By Arruda, Chervet, Staudt et al

11 hours ago 0 0 0 0
This paper investigates the off-policy evaluation (OPE) problem from a distributional perspective. Rather than focusing solely on the expectation of the total return, as in most existing OPE methods, we aim to estimate the entire return distribution. To this end, we introduce a quantile-based approach for OPE using deep quantile process regression, presenting a novel algorithm called Deep Quantile Process regression-based Off-Policy Evaluation (DQPOPE). We provide new theoretical insights into the deep quantile process regression technique, extending existing approaches that estimate discrete quantiles to estimate a continuous quantile function. A key contribution of our work is the rigorous sample complexity analysis for distributional OPE with deep neural networks, bridging theoretical analysis with practical algorithmic implementations. We show that DQPOPE achieves statistical advantages by estimating the full return distribution using the same sample size required to estimate a single policy value using conventional methods. Empirical studies further show that DQPOPE provides significantly more precise and robust policy value estimates than standard methods, thereby enhancing the practical applicability and effectiveness of distributional reinforcement learning approaches.

This paper investigates the off-policy evaluation (OPE) problem from a distributional perspective. Rather than focusing solely on the expectation of the total return, as in most existing OPE methods, we aim to estimate the entire return distribution. To this end, we introduce a quantile-based approach for OPE using deep quantile process regression, presenting a novel algorithm called Deep Quantile Process regression-based Off-Policy Evaluation (DQPOPE). We provide new theoretical insights into the deep quantile process regression technique, extending existing approaches that estimate discrete quantiles to estimate a continuous quantile function. A key contribution of our work is the rigorous sample complexity analysis for distributional OPE with deep neural networks, bridging theoretical analysis with practical algorithmic implementations. We show that DQPOPE achieves statistical advantages by estimating the full return distribution using the same sample size required to estimate a single policy value using conventional methods. Empirical studies further show that DQPOPE provides significantly more precise and robust policy value estimates than standard methods, thereby enhancing the practical applicability and effectiveness of distributional reinforcement learning approaches.

arXiv📈🤖
Distributional Off-Policy Evaluation with Deep Quantile Process Regression
By Kuang, Wang, Jiao et al

11 hours ago 0 0 0 0
Machine learning has become integral to medical research and is increasingly applied in clinical settings to support diagnosis and decision-making; however, its effectiveness depends on access to large, diverse datasets, which are limited within single institutions. Although integrating data across institutions can address this limitation, privacy regulations and data ownership constraints hinder these efforts. Federated learning enables collaborative model training without sharing raw data; however, most methods rely on complex architectures that lack interpretability, limiting clinical applicability. Therefore, we proposed a federated RuleFit framework to construct a unified and interpretable global model for distributed environments. It integrates three components: preprocessing based on differentially private histograms to estimate shared cutoff values, enabling consistent rule definitions and reducing heterogeneity across clients; local rule generation using gradient boosting decision trees with shared cutoffs; and coefficient estimation via $\ell_1$-regularized optimization using a Federated Dual Averaging algorithm for sparse and consistent variable selection. In simulation studies, the proposed method achieved a performance comparable to that of centralized RuleFit while outperforming existing federated approaches. Real-world analysis demonstrated its ability to provide interpretable insights with competitive predictive accuracy. Therefore, the proposed framework offers a practical and effective solution for interpretable and reliable modeling in federated learning environments.

Machine learning has become integral to medical research and is increasingly applied in clinical settings to support diagnosis and decision-making; however, its effectiveness depends on access to large, diverse datasets, which are limited within single institutions. Although integrating data across institutions can address this limitation, privacy regulations and data ownership constraints hinder these efforts. Federated learning enables collaborative model training without sharing raw data; however, most methods rely on complex architectures that lack interpretability, limiting clinical applicability. Therefore, we proposed a federated RuleFit framework to construct a unified and interpretable global model for distributed environments. It integrates three components: preprocessing based on differentially private histograms to estimate shared cutoff values, enabling consistent rule definitions and reducing heterogeneity across clients; local rule generation using gradient boosting decision trees with shared cutoffs; and coefficient estimation via $\ell_1$-regularized optimization using a Federated Dual Averaging algorithm for sparse and consistent variable selection. In simulation studies, the proposed method achieved a performance comparable to that of centralized RuleFit while outperforming existing federated approaches. Real-world analysis demonstrated its ability to provide interpretable insights with competitive predictive accuracy. Therefore, the proposed framework offers a practical and effective solution for interpretable and reliable modeling in federated learning environments.

arXiv📈🤖
Federated Rule Ensemble Method in Medical Data
By Wan, Tanioka, Shimokawa

11 hours ago 0 0 0 0
We propose post-screening portfolio selection (PS$^2$), a two-step framework for high-dimensional mean--variance investing. First, assets are screened by Lasso-type regression of a constant on excess returns without an intercept. Second, portfolio weights are estimated on the selected set using standard low-dimensional methods. Because strong factors can destroy sparsity in real data, we further introduce PS$^2$ with factors (FPS$^2$), which defactors returns before screening and allows factor investing in the final step. We establish theoretical guarantees, and simulations and an empirical application show competitive performance, especially when sparse screening is appropriate or strong factors are explicitly accommodated.

We propose post-screening portfolio selection (PS$^2$), a two-step framework for high-dimensional mean--variance investing. First, assets are screened by Lasso-type regression of a constant on excess returns without an intercept. Second, portfolio weights are estimated on the selected set using standard low-dimensional methods. Because strong factors can destroy sparsity in real data, we further introduce PS$^2$ with factors (FPS$^2$), which defactors returns before screening and allows factor investing in the final step. We establish theoretical guarantees, and simulations and an empirical application show competitive performance, especially when sparse screening is appropriate or strong factors are explicitly accommodated.

arXiv📈🤖
Post-Screening Portfolio Selection
By Uematsu, Tanaka

11 hours ago 1 0 0 0
Double/debiased machine learning (DML) provides a general framework for inference with high-dimensional or otherwise complex nuisance parameters by combining Neyman-orthogonal scores with cross-fitting, thereby circumventing classical Donsker-type conditions in many modern machine-learning settings. Despite its strong empirical performance, bootstrap inference for DML estimators has received little theoretical justification. This is particularly noteworthy since bootstrap methods are suggested ad used for inference on DML estimators, even though bootstrap procedures can fail for estimators that are root-$n$ consistent and asymptotically normal. This paper fills this gap by establishing bootstrap validity for DML estimators under general exchangeably weighted resampling schemes, with Efron's bootstrap as a special case. Under exactly the same conditions required for the validity of DML itself, we prove that the bootstrap law converges conditionally weakly to the sampling law of the original estimator.

Double/debiased machine learning (DML) provides a general framework for inference with high-dimensional or otherwise complex nuisance parameters by combining Neyman-orthogonal scores with cross-fitting, thereby circumventing classical Donsker-type conditions in many modern machine-learning settings. Despite its strong empirical performance, bootstrap inference for DML estimators has received little theoretical justification. This is particularly noteworthy since bootstrap methods are suggested ad used for inference on DML estimators, even though bootstrap procedures can fail for estimators that are root-$n$ consistent and asymptotically normal. This paper fills this gap by establishing bootstrap validity for DML estimators under general exchangeably weighted resampling schemes, with Efron's bootstrap as a special case. Under exactly the same conditions required for the validity of DML itself, we prove that the bootstrap law converges conditionally weakly to the sampling law of the original estimator.

arXiv📈🤖
Bootstrap consistency for general double/debiased machine learning estimators
By Lin, Han

11 hours ago 0 0 0 0
Bitcoin transaction fees will become more important as the block subsidy declines, but fee formation is hard to study with blockchain data alone because the relevant queueing environment is unobserved. We develop and estimate a structural model of Bitcoin fee choice that treats the mempool as a market for scarce blockspace. We assemble a novel, high-frequency mempool panel, from a self-run Bitcoin node that records transaction arrivals, exits, block inclusion, fee-bumping events, and congestion snapshots. We characterize the fee market as a Vickery-Clarke-Groves mechanism and derive an equation to estimate fees. In the first-stage we estimate a monotone delay technology linking fee-rate priority and network state to expected confirmation delay. We then estimate how fees respond to that delay technology and to transaction characteristics. We find that congestion is the main determinant of delay; that the marginal value of priority is priced in fees, which is increasing in the gradient of confirmation time reduction per movement up in the fee queue; and that transactor choice of RBF, CPFP, and block conditions have economically important effects on fees.

Bitcoin transaction fees will become more important as the block subsidy declines, but fee formation is hard to study with blockchain data alone because the relevant queueing environment is unobserved. We develop and estimate a structural model of Bitcoin fee choice that treats the mempool as a market for scarce blockspace. We assemble a novel, high-frequency mempool panel, from a self-run Bitcoin node that records transaction arrivals, exits, block inclusion, fee-bumping events, and congestion snapshots. We characterize the fee market as a Vickery-Clarke-Groves mechanism and derive an equation to estimate fees. In the first-stage we estimate a monotone delay technology linking fee-rate priority and network state to expected confirmation delay. We then estimate how fees respond to that delay technology and to transaction characteristics. We find that congestion is the main determinant of delay; that the marginal value of priority is priced in fees, which is increasing in the gradient of confirmation time reduction per movement up in the fee queue; and that transactor choice of RBF, CPFP, and block conditions have economically important effects on fees.

arXiv📈🤖
A Model and Estimation of the Bitcoin Transaction Fee
By Aronoff, Praizner, Sabouri

11 hours ago 0 0 0 0
Advertisement
Sparsity or complexity? In modern high-dimensional asset pricing, these are often viewed as competing principles: richer feature spaces appear to favor complexity, while economic intuition has long favored parsimony. We show that this tension is misplaced. We distinguish capacity sparsity-the dimensionality of the candidate feature space-from factor sparsity-the parsimonious structure of priced risks-and argue that the two are complements: expanding capacity enables the discovery of factor sparsity. Revisiting the benchmark empirical design of Didisheim et al. (2025) and pushing it to higher complexity regimes, we show that nonlinear feature expansions combined with basis pursuit yield portfolios whose out-of-sample performance dominates ridgeless benchmarks beyond a critical complexity threshold. The evidence shows that the gains from complexity arise not from retaining more factors, but from enlarging the space from which a sparse structure of priced risks can be identified. The virtue of complexity in asset pricing operates through factor sparsity.

Sparsity or complexity? In modern high-dimensional asset pricing, these are often viewed as competing principles: richer feature spaces appear to favor complexity, while economic intuition has long favored parsimony. We show that this tension is misplaced. We distinguish capacity sparsity-the dimensionality of the candidate feature space-from factor sparsity-the parsimonious structure of priced risks-and argue that the two are complements: expanding capacity enables the discovery of factor sparsity. Revisiting the benchmark empirical design of Didisheim et al. (2025) and pushing it to higher complexity regimes, we show that nonlinear feature expansions combined with basis pursuit yield portfolios whose out-of-sample performance dominates ridgeless benchmarks beyond a critical complexity threshold. The evidence shows that the gains from complexity arise not from retaining more factors, but from enlarging the space from which a sparse structure of priced risks can be identified. The virtue of complexity in asset pricing operates through factor sparsity.

arXiv📈🤖
The Virtue of Sparsity in Complexity
By Afsharhajari, Li

11 hours ago 0 0 0 0
The paper considers the computation of L1 regularization paths in a state space setting, which includes L1 regularized Kalman smoothing, linear SVM, LASSO, and more. The paper proposes two new algorithms, which are duals of each other; the first algorithm applies to L1 regularization of independent variables while the second applies to L1 regularization of dependent variables. The heart of the proposed algorithms is parametric Gaussian message passing (i.e., Kalman-type forward-backward recursions) in the pertinent factor graphs. The proposed methods are broadly applicable, they (usually) require only matrix multiplications, and their complexity can be competitive with prior methods in some cases.

The paper considers the computation of L1 regularization paths in a state space setting, which includes L1 regularized Kalman smoothing, linear SVM, LASSO, and more. The paper proposes two new algorithms, which are duals of each other; the first algorithm applies to L1 regularization of independent variables while the second applies to L1 regularization of dependent variables. The heart of the proposed algorithms is parametric Gaussian message passing (i.e., Kalman-type forward-backward recursions) in the pertinent factor graphs. The proposed methods are broadly applicable, they (usually) require only matrix multiplications, and their complexity can be competitive with prior methods in some cases.

arXiv📈🤖
L1 Regularization Paths in Linear Models by Parametric Gaussian Message Passing
By Li, Loeliger

11 hours ago 0 0 0 0
Factor-based Structural Equation Modeling (SEM) relies on likelihood-based estimation assuming a nonsingular sample covariance matrix, which breaks down in small-sample settings with $p>n$. To address this, we propose a novel estimation principle that reformulates the covariance structure into self-covariance and cross-covariance components. The resulting framework defines a likelihood-based feasible set combined with a relative error constraint, enabling stable estimation in small-sample settings where $p>n$ for sign and direction. Experiments on synthetic and real-world data show improved stability, particularly in recovering the sign and direction of structural parameters. These results extend covariance-based SEM to small-sample settings and provide practically useful directional information for decision-making.

Factor-based Structural Equation Modeling (SEM) relies on likelihood-based estimation assuming a nonsingular sample covariance matrix, which breaks down in small-sample settings with $p>n$. To address this, we propose a novel estimation principle that reformulates the covariance structure into self-covariance and cross-covariance components. The resulting framework defines a likelihood-based feasible set combined with a relative error constraint, enabling stable estimation in small-sample settings where $p>n$ for sign and direction. Experiments on synthetic and real-world data show improved stability, particularly in recovering the sign and direction of structural parameters. These results extend covariance-based SEM to small-sample settings and provide practically useful directional information for decision-making.

arXiv📈🤖
Covariance-Based Structural Equation Modeling in Small-Sample Settings with $p>n$
By Hasegawa, Tamura, Okada

11 hours ago 0 0 0 0
Predictive inference in the sparse Gaussian sequence model has received considerably less attention than its non-sparse, finite-sample counterpart. Existing work has largely been confined to discrete mixture priors. In this paper, we study predictive inference under a widely used continuous mixture prior, the Horseshoe. We provide new theoretical results establishing exact asymptotic minimax optimality of the predictive Bayes estimator when the sparsity level is known. Furthermore, through a Gaussian-mixture representation of the posterior predictive density (which we term Horseshoe spectroscopy), the phase-transition in the local shrinkage scale is inherited by the predictive mechanism, producing behavior similar to that of previous thresholding/switching estimators. When sparsity is unknown, we adopt a fully Bayesian approach using a hierarchical Horseshoe prior and show that it performs adaptive, as opposed to manual, switching. Under a theta-min condition, the resulting predictive risk admits an upper bound over a restricted parameter class that is sharper than the minimax rate over the full class. We demonstrate the practical value of predictive Horseshoe shrinkage on data such as images and time series that can be naturally modeled as sparse Gaussian sequences. We illustrate this approach on facial recognition across varying facial expressions and study region-wise atypical brain lateralization in autism spectrum disorder.

Predictive inference in the sparse Gaussian sequence model has received considerably less attention than its non-sparse, finite-sample counterpart. Existing work has largely been confined to discrete mixture priors. In this paper, we study predictive inference under a widely used continuous mixture prior, the Horseshoe. We provide new theoretical results establishing exact asymptotic minimax optimality of the predictive Bayes estimator when the sparsity level is known. Furthermore, through a Gaussian-mixture representation of the posterior predictive density (which we term Horseshoe spectroscopy), the phase-transition in the local shrinkage scale is inherited by the predictive mechanism, producing behavior similar to that of previous thresholding/switching estimators. When sparsity is unknown, we adopt a fully Bayesian approach using a hierarchical Horseshoe prior and show that it performs adaptive, as opposed to manual, switching. Under a theta-min condition, the resulting predictive risk admits an upper bound over a restricted parameter class that is sharper than the minimax rate over the full class. We demonstrate the practical value of predictive Horseshoe shrinkage on data such as images and time series that can be naturally modeled as sparse Gaussian sequences. We illustrate this approach on facial recognition across varying facial expressions and study region-wise atypical brain lateralization in autism spectrum disorder.

arXiv📈🤖
Horseshoe Predictive Inference
By Zhai, Ro\v{c}kov\'a

11 hours ago 0 0 0 0
AI in applications like screening job applicants had become widespread, and may contribute to unemployment especially among the young. Biases in the AIs may become baked into the job selection process, but even in their absence, reliance on a single AI is problematic. In this paper we derive a simple formula to estimate, or at least place an upper bound on, the precision of such approaches for data resembling realistic CVs:
  $P(q) \approx \frac{\rho n^b + q(1-\rho)}{1 + (n^b - 1)\rho}$ where $b \approx q^* + 0.8 (1 - \rho)$ and $q^*$ is $q$ clipped to $[0.07, 0.22]$ where $P(q)$ is the precision of the top $q$ quantile selected by a panel of $n$ AIs and $\rho$ is their average pairwise correlation. This equation provides a basis for considering how many AIs should be used in a Panel, depending on the importance of the decision. A quantitative discussion of the merits of using a diverse panel of AIs to support decision-making in such areas will move away from dangerous reliance on single AI systems and encourage a balanced assessment of the extent to which diversity needs to be built into the AI parts of the socioeconomic systems that are so important for our future.

AI in applications like screening job applicants had become widespread, and may contribute to unemployment especially among the young. Biases in the AIs may become baked into the job selection process, but even in their absence, reliance on a single AI is problematic. In this paper we derive a simple formula to estimate, or at least place an upper bound on, the precision of such approaches for data resembling realistic CVs: $P(q) \approx \frac{\rho n^b + q(1-\rho)}{1 + (n^b - 1)\rho}$ where $b \approx q^* + 0.8 (1 - \rho)$ and $q^*$ is $q$ clipped to $[0.07, 0.22]$ where $P(q)$ is the precision of the top $q$ quantile selected by a panel of $n$ AIs and $\rho$ is their average pairwise correlation. This equation provides a basis for considering how many AIs should be used in a Panel, depending on the importance of the decision. A quantitative discussion of the merits of using a diverse panel of AIs to support decision-making in such areas will move away from dangerous reliance on single AI systems and encourage a balanced assessment of the extent to which diversity needs to be built into the AI parts of the socioeconomic systems that are so important for our future.

arXiv📈🤖
Quantifying how AI Panels improve precision
By Beale

11 hours ago 0 0 0 0
Determining the number of factors in high-dimensional factor models remains a fundamental challenge, particularly when data are incomplete. This paper introduces the concept of identifiable factors, those that can be reliably recovered despite missing observations, and proposes the Missingness-Adaptive Thresholding Estimator (MATE). To our knowledge, MATE is the first missingness-adaptive framework for factor number determination that accommodates both homogeneous and heterogeneous missingness without imposing restrictive assumptions on factor strength. Notably, it operates without data imputation, circumventing the computational burden associated with most existing approaches. We establish a rigorous theoretical foundation for MATE, proving its consistency under a range of structural conditions. Extensive simulations and real-world applications demonstrate that MATE consistently outperforms state-of-the-art methods, exhibiting superior robustness in settings with high missingness rates and weak factor signals.

Determining the number of factors in high-dimensional factor models remains a fundamental challenge, particularly when data are incomplete. This paper introduces the concept of identifiable factors, those that can be reliably recovered despite missing observations, and proposes the Missingness-Adaptive Thresholding Estimator (MATE). To our knowledge, MATE is the first missingness-adaptive framework for factor number determination that accommodates both homogeneous and heterogeneous missingness without imposing restrictive assumptions on factor strength. Notably, it operates without data imputation, circumventing the computational burden associated with most existing approaches. We establish a rigorous theoretical foundation for MATE, proving its consistency under a range of structural conditions. Extensive simulations and real-world applications demonstrate that MATE consistently outperforms state-of-the-art methods, exhibiting superior robustness in settings with high missingness rates and weak factor signals.

arXiv📈🤖
Missingness-Adaptive Factor Identification in High-Dimensional Data
By Zeng, Zeng, Zhu

11 hours ago 1 0 0 0
We propose an empirical Bayes framework for aggregating estimators obtained from several identification functionals associated to the same causal parameter. The central object is a posterior mean that pools a collection of asymptotically linear estimators of a scalar causal target. We establish consistency in two non-nested regimes: exact identifiability, in which every functional identifies the same causal effect; and a second regime, in which individual functionals are biased but the identification biases are mean-zero across functionals, and the number of functionals grows with sample size. The dependence induced by evaluating all estimators on the same sample is handled through a working independence device that preserves consistency of the point estimator. Inference is organized around a latent heterogeneity hyperparameter: when it vanishes, the functionals share a common target and we report frequentist confidence intervals for that target via a sandwich variance or subsampling; when it is strictly positive, each functional targets a genuine draw from a mixing distribution and we construct asymptotically valid Bayesian prediction intervals for the latent target of a new functional. The two inferential outputs rest on distinct assumption sets and are, therefore, complementary rather than exclusive. We illustrate the framework in the context of augmenting randomized controlled trials with observational evidence.

We propose an empirical Bayes framework for aggregating estimators obtained from several identification functionals associated to the same causal parameter. The central object is a posterior mean that pools a collection of asymptotically linear estimators of a scalar causal target. We establish consistency in two non-nested regimes: exact identifiability, in which every functional identifies the same causal effect; and a second regime, in which individual functionals are biased but the identification biases are mean-zero across functionals, and the number of functionals grows with sample size. The dependence induced by evaluating all estimators on the same sample is handled through a working independence device that preserves consistency of the point estimator. Inference is organized around a latent heterogeneity hyperparameter: when it vanishes, the functionals share a common target and we report frequentist confidence intervals for that target via a sandwich variance or subsampling; when it is strictly positive, each functional targets a genuine draw from a mixing distribution and we construct asymptotically valid Bayesian prediction intervals for the latent target of a new functional. The two inferential outputs rest on distinct assumption sets and are, therefore, complementary rather than exclusive. We illustrate the framework in the context of augmenting randomized controlled trials with observational evidence.

arXiv📈🤖
Shrinkage through multiple identifiability
By Meixide, Insua

11 hours ago 0 0 0 0
We discuss the regression-by-composition framework of Farewell, Daniel, Stensrud and Huitfeldt, highlighting a key consequence of its sequential construction: order dependence. Reordering the flows may change the implied conditional distribution, the interpretation of model parameters, and the associated estimation problem, with consequences for model specification, interpretation, and inference.

We discuss the regression-by-composition framework of Farewell, Daniel, Stensrud and Huitfeldt, highlighting a key consequence of its sequential construction: order dependence. Reordering the flows may change the implied conditional distribution, the interpretation of model parameters, and the associated estimation problem, with consequences for model specification, interpretation, and inference.

arXiv📈🤖
Order Dependence in Regression by Composition: Discussion on "Regression by Composition'' by Farewell, Daniel, Stensrud, and Huitfeldt
By Dong, Wang, Liu et al

11 hours ago 0 0 0 0
In an editorial in the Journal of Marketing, Steenkamp et al. (2026) make a valuable and timely intervention by urging marketing scholars to move beyond dichotomous significance testing and to report effect sizes that speak to substantive significance. Their editorial is especially strong in its insistence on exact p-values, richer statistical reporting, and closer alignment between rigor and relevance. Yet, their framework omits the local form of Cohen's f^2, that is f(B)^2 as an effect-size measure for the contribution of an individual predictor or predictor block B within a multivariable model. That omission matters because much of marketing research relies on regression-type models in which the central theoretical question is not merely whether a model fits globally, but whether a focal construct adds meaningful explanatory power beyond competing predictors and controls. This commentary argues that the R-squared foundation of local Cohen's f(B)^2 is a strength, especially in large-sample settings. Moreover, f-squared-type local effect sizes can be extended beyond ordinary least squares to multilevel models and, more tentatively, to neural networks and other machine-learning models.

In an editorial in the Journal of Marketing, Steenkamp et al. (2026) make a valuable and timely intervention by urging marketing scholars to move beyond dichotomous significance testing and to report effect sizes that speak to substantive significance. Their editorial is especially strong in its insistence on exact p-values, richer statistical reporting, and closer alignment between rigor and relevance. Yet, their framework omits the local form of Cohen's f^2, that is f(B)^2 as an effect-size measure for the contribution of an individual predictor or predictor block B within a multivariable model. That omission matters because much of marketing research relies on regression-type models in which the central theoretical question is not merely whether a model fits globally, but whether a focal construct adds meaningful explanatory power beyond competing predictors and controls. This commentary argues that the R-squared foundation of local Cohen's f(B)^2 is a strength, especially in large-sample settings. Moreover, f-squared-type local effect sizes can be extended beyond ordinary least squares to multilevel models and, more tentatively, to neural networks and other machine-learning models.

arXiv📈🤖
Effect Sizes in Marketing Research: Why Cohen's Local f^2 Belongs in the Toolkit
By Messner

11 hours ago 0 0 0 0
Advertisement
Win statistics have become increasingly popular for analyzing hierarchical composite endpoints in clinical trials, because they summarize treatment benefit through pairwise comparisons that respect the clinical importance order among outcome components. The win ratio, win odds, net benefit, and desirability of outcome ranking (DOOR) are all based on the same underlying pairwise comparison methodology and can complement one another to show the strength of the treatment effect. Despite recent progress on win statistics, statistical inference for win statistics in cluster randomized trials (CRTs) remains underdeveloped. In this paper, we provide a comprehensive survey of testing procedures for the win ratio, win odds, net benefit, and DOOR in parallel-arm CRTs with hierarchical composite outcomes. Then based on each win statistic, we compare different testing procedures, including Wald tests based on cluster rank sum statistics and bivariate clustered U-statistics, tests that use a cluster jackknife variance, a score permutation test, a permutation based procedure with analytical variance estimation, and likelihood ratio test derived from clustered jackknife estimates. Through simulation studies that consider varying scenarios such as different cluster sizes, intracluster correlations, and censoring-induced ties, we characterize the finite-sample type I error and power of each procedure across a range of practical settings with small and large numbers of clusters.We illustrate our methods by reanalyzing the Strategies to Reduce Injuries and Develop Confidence in Elders (STRIDE) pragmatic CRT, and implement all win statistics methods in the WinsCRT R package.

Win statistics have become increasingly popular for analyzing hierarchical composite endpoints in clinical trials, because they summarize treatment benefit through pairwise comparisons that respect the clinical importance order among outcome components. The win ratio, win odds, net benefit, and desirability of outcome ranking (DOOR) are all based on the same underlying pairwise comparison methodology and can complement one another to show the strength of the treatment effect. Despite recent progress on win statistics, statistical inference for win statistics in cluster randomized trials (CRTs) remains underdeveloped. In this paper, we provide a comprehensive survey of testing procedures for the win ratio, win odds, net benefit, and DOOR in parallel-arm CRTs with hierarchical composite outcomes. Then based on each win statistic, we compare different testing procedures, including Wald tests based on cluster rank sum statistics and bivariate clustered U-statistics, tests that use a cluster jackknife variance, a score permutation test, a permutation based procedure with analytical variance estimation, and likelihood ratio test derived from clustered jackknife estimates. Through simulation studies that consider varying scenarios such as different cluster sizes, intracluster correlations, and censoring-induced ties, we characterize the finite-sample type I error and power of each procedure across a range of practical settings with small and large numbers of clusters.We illustrate our methods by reanalyzing the Strategies to Reduce Injuries and Develop Confidence in Elders (STRIDE) pragmatic CRT, and implement all win statistics methods in the WinsCRT R package.

arXiv📈🤖
Statistical inference with win statistics in cluster-randomized trials with composite outcomes
By Fang, Tong, Huang et al

12 hours ago 0 0 0 0
Stepped-wedge cluster randomized trials (SW-CRTs) evaluate interventions rolled out across clusters over time. Standard analyses typically use immediate-treatment (IT) models, which assume effects begin at crossover and remain constant thereafter. When effects vary with exposure duration, IT models may misrepresent target effects. Exposure-time indicator (ETI) models address this by allowing treatment effects to differ by time since exposure and by targeting the time-averaged treatment effect (TATE) and long-term effect (LTE). Like IT models, ETI models require specification of a random-effects structure, which is often misspecified, and the performance of robust variance estimators (RVEs) in this setting is not well understood. We review RVEs for ETI models and evaluate them in simulation studies with continuous and binary outcomes under correctly specified (binary only) and misspecified random-effects structures. We compare the classic sandwich, Kauermann-Carroll (KC), Mancl-DeRouen (MD), and Morel-Bokossa-Neerchal (MBN) estimators for inference on the TATE and LTE. Our simulations show that under misspecified random-effects structures, model-based standard errors (SE) produced undercoverage, whereas RVEs improved performance. For continuous outcomes, MD with a t-distribution and degrees of freedom equal to the number of clusters minus two gave the most consistent coverage probabilities. For binary outcomes, MBN was the only consistently reliable option. MD, however, could be unstable in one-cluster-per-sequence designs because of data sparsity. Across scenarios, both model-based SE and RVE for LTE were unstable, indicating that greater caution is needed when targeting LTE under ETI models.

Stepped-wedge cluster randomized trials (SW-CRTs) evaluate interventions rolled out across clusters over time. Standard analyses typically use immediate-treatment (IT) models, which assume effects begin at crossover and remain constant thereafter. When effects vary with exposure duration, IT models may misrepresent target effects. Exposure-time indicator (ETI) models address this by allowing treatment effects to differ by time since exposure and by targeting the time-averaged treatment effect (TATE) and long-term effect (LTE). Like IT models, ETI models require specification of a random-effects structure, which is often misspecified, and the performance of robust variance estimators (RVEs) in this setting is not well understood. We review RVEs for ETI models and evaluate them in simulation studies with continuous and binary outcomes under correctly specified (binary only) and misspecified random-effects structures. We compare the classic sandwich, Kauermann-Carroll (KC), Mancl-DeRouen (MD), and Morel-Bokossa-Neerchal (MBN) estimators for inference on the TATE and LTE. Our simulations show that under misspecified random-effects structures, model-based standard errors (SE) produced undercoverage, whereas RVEs improved performance. For continuous outcomes, MD with a t-distribution and degrees of freedom equal to the number of clusters minus two gave the most consistent coverage probabilities. For binary outcomes, MBN was the only consistently reliable option. MD, however, could be unstable in one-cluster-per-sequence designs because of data sparsity. Across scenarios, both model-based SE and RVE for LTE were unstable, indicating that greater caution is needed when targeting LTE under ETI models.

arXiv📈🤖
Which Small-Sample Correction Should Be Used When Analyzing Stepped-Wedge Designs with Time-Varying Treatment Effects?
By Ouyang, Taljaard, Hughes et al

12 hours ago 1 0 0 0