Bayesian Inference
Conformal Bayes for Two-Sided Censored Gaussian Regression under Label Shift
Prediction under label shift becomes nonstandard when responses are censored. In a two-sided censored Gaussian model, latent values below $L$ and above $U$ are recorded at the boundary values, so the observed predictive distribution is mixed, with atoms at $L$ and $U$ and a continuous density on $(L,U)$. In this paper we develop conformal Bayes for this mixed-space setting by combining posterior predictive tilting with weighted conformal calibration. Under a two-sided Tobit Gaussian Bayesian prediction head with a Laplace posterior approximation, the tilted predictive distribution has left-atom, interior, and right-atom components, with a three-term closed-form normalizer. The resulting prediction set is a mixed highest density region that can combine boundary atoms with an interior interval and can reduce to atom-only sets under strong censoring. The main technical issue is that latent label shift does not directly give an ordinary density ratio on the observed censored scale. A latent exponential tilt induces tail-averaged atom weights at the censored boundaries, while the interior ratio remains density based. This yields a mixed observed-space calibration weight with two atom ratios and one interior density ratio. The weight corrects the calibration measure, while predictive tilting gives target-adapted mixed-HDR geometry. Synthetic experiments show that weighted tilted conformal Bayes restores marginal coverage with smaller sets than weighted source-score calibration, while revealing a trade-off between marginal coverage and component-wise behavior across atoms and interior observations.
Bayesian Best-Arm Identification with Abstention: A Polynomial-to-Exponential Phase Transition
Huang, Yuqi, Hou, Yunlong, Tan, Vincent Y. F.
We study the Bayesian fixed-budget best-arm identification problem in which a learner can abstain from making a terminal recommendation. Subject to an abstention budget $ฮฑ$, we analyze the probability of undetected error--the risk of recommending a suboptimal arm without abstaining. Our central finding is that abstention induces a phase transition: without abstention, the error probability decays polynomially in the sampling budget $T$; in contrast, introducing any small positive abstention budget shifts this to an exponential decay. For Gaussian priors and rewards, in the regime $T\to\infty$ followed by $ฮฑ\downarrow0$, we establish exact matching information-theoretic lower bounds and algorithmic upper bounds on the optimal error exponent, which takes the form $\exp(-\frac{ฮฑ^{2}T}{8ฮบ_ฮฝ^{2}})$. The hardness parameter $ฮบ_ฮฝ$ represents the prior density of the top-two gap at zero, highlighting that nearly tied instances drive the fundamental error. We introduce an adaptive algorithm, PGWS, that successfully achieves this optimal exponent by expending its abstention budget on statistically ambiguous instances. We further demonstrate that this polynomial-to-exponential improvement is exclusively a Bayesian phenomenon--in the frequentist setting, abstention only affects lower-order exponent terms. We also extend our results beyond the Gaussian model.
A Mathematical Optimization Approach for Expert-Informed Bayesian Best Subset Selection
Alexander, Nolan, Mortveit, Henning
A central challenge in statistical modeling is identifying the subset of features that belong in the true regression model. The classical best subset selection problem, recently made tractable via mixed-integer optimization (MIO), finds the globally optimal sparse solution. It does not, however, make use of any information beyond the observed data. In many applied settings, domain experts can meaningfully rank or score the relevance of candidate predictors, yet no existing framework integrates such probabilistic expert assessments directly into the best-subsets objective. This paper presents Expert-Implied Bayesian Best Subsets (EBBS), a method that incorporates domain-expert probability estimates of feature relevance into the MIO best-subsets problem through a maximum a posteriori (MAP) framework. Expert views from multiple respondents are aggregated into a single prior probability per feature using the Poisson binomial distribution for marginal probability estimates, the pairwise win rate for pairwise comparisons, or the normalized mean rank for ordinal rankings. This probability enters the objective function as a log-odds penalty term that smoothly encourages or discourages the selection of each feature consistent with the expert consensus. This paper provides analytic derivations of the MAP formulation and characterizes its theoretical properties. The proposed model reduces to Best Subsets when experts all have no views. Empirical results on synthetic and real datasets are forthcoming.
Perspectives on Latent Factor Indeterminacy and its Implications for Data Representation
The common factor analytic model is related to Helmholtz and Boltzmann machines, can be conceived as a linear autoencoder, or can be thought of as a single-hidden-layer generative neural network. We thus consider it a basal generative representation learner that can be used as a minimal model for studying the foundational characteristics of (deep) generative model architectures. We focus on the fundamental problem of indeterminacy in latent factor projections. This indeterminacy implies that, even when the intrinsic dimension of the latent vector is known, regularity conditions are met, and rotational indeterminacy is resolved, an inherent indefiniteness in the retrieval of causative latent sources remains: they will be uncertain, distributionally deviant, and non-unique. This can have major implications for data representation but remains an elusive issue, even to practitioners and theorists well-versed in the factor model. Moreover, this classic psychometric problem is intricately related to the modern issue of latent variable collapse in the variational autoencoder framework for deep generative modeling. Here, we assess this indeterminacy from various perspectives and show how these are mathematically and conceptually related and we discuss subsequent implications for the Psychometrics, Statistics, and Artificial Intelligence communities. We show that one has latent factor determinacy across all its facets when the feature-dimension grows to infinity. This feeds into an essentially distribution-free estimation approach in the sample case when the number of features grows very large. We conclude, as these are emergent properties at scale, that the factor model is suited for representation learning of very-high-dimensional data.
Conformal Bayes under Label Shift: Post-Hoc Calibration vs. In-Training Adaptation
Conformal Bayes combines Bayesian posterior predictives with conformal calibration to produce prediction sets that are both statistically valid and geometrically efficient. We study conformal Bayes under label shift from a unified perspective, identifying two complementary approaches that restore nominal target-domain coverage through importance-weighted conformal calibration but operate through independent mechanisms. \emph{Post-hoc calibration} tilts the posterior predictive toward the target domain and corrects the conformal threshold via an importance-weighted quantile, leaving the parameter posterior unchanged. \emph{In-training adaptation} tilts the parameter posterior itself to the target domain, producing a corrected predictive whose highest predictive density region serves as the highest predictive density (HPD)-based prediction set under the fitted target predictive; efficiency is model-dependent and does not imply finite-sample conditional optimality. Two controlled experiments isolate the regime-dependence of each strategy: in the low-dimensional, well-estimated regime Strategy~A produces the narrowest valid intervals, while in the high-dimensional, underdetermined regime Strategy~B achieves up to $43\%$ width reduction at unchanged coverage, under the stated source-sampling and label-shift assumptions.
Smoothness-Based Derandomization of PAC-Bayes Bounds
Paquin, Alexandre Lemire, Chaib-Draa, Brahim, Giguรจre, Philippe
We study PAC-Bayes derandomization for smooth loss functions. Our goal is to obtain generalization bounds that hold with high probability for deterministic predictors by exploiting smoothness properties of both the loss and the predictor class. We show that passing from the Gibbs predictor to the deterministic predictor at the posterior mean has a precise cost, given by the generalization gap of the Jensen gap class. We control this class through its Rademacher complexity, leading to bounds for deterministic predictors that involve flatness quantities expressed in terms of parameter Jacobians and Hessians of the score map. The framework applies to both bounded and unbounded smooth loss functions, and we specialize the results to linear predictors and smooth neural networks. Finally, the Jacobian and Hessian quantities appearing in the theory motivate a practical regularizer. For BatchNorm networks, we compute this regularizer with respect to effective BatchNorm weights obtained by folding the BatchNorm transformation into the adjacent affine weights. Experiments on CIFAR-10 illustrate the behavior of this regularizer under different batch sizes.
XMSE-Aware Adaptive Empirical Bayes Estimation
Empirical Bayes (EB) estimators can match the first-order asymptotic risk of maximum likelihood (ML) while behaving very differently at second order: recent excess mean squared error (XMSE) analysis shows that kernel-based EB estimation may be worse than ML when the kernel is poorly aligned with the true parameter. This paper turns that diagnostic into a design principle. We propose an XMSE-aware mixed estimator that interpolates between ML and EB shrinkage. Its fixed-weight XMSE is a scalar quadratic, yielding a closed-form oracle mixing weight that is no worse than both ML and the base EB estimator at the XMSE scale. A plug-in implementation based on finite-sample XMSE approximations is proved consistent, with a second-order oracle regret rate for an interior oracle weight. We further establish a transfer of the regret bound to the fixed-weight risk curve evaluated at the selected weight, a thresholded boundary rule, and extensions to compact kernel families and to finite and growing kernel dictionaries with high-probability oracle bounds. Finite impulse response simulations with SURE-tuned, hard-selection, and trace-corrected baselines, together with the public Silverbox and Cascaded Tanks benchmarks, show that the proposed estimator retains most of the benefit of regularization when it is helpful and retreats toward ML under kernel misspecification, with an identified finite-de analyzed on the benchmarks.
Statistical and Structural Approaches to Algorithmic Fairness
Modern machine learning systems have outgrown their origins as isolated predictive constructs, evolving into complex socio-technical architectures that actively mediate human opportunity. As algorithms increasingly determine access to economic and social opportunities, it has become widely recognized that these systems are deeply embedded with the structural inequalities and prejudices of their environments. The field of algorithmic fairness emerged in response to the growing recognition that models optimized for predictive accuracy can systematically disadvantage marginalized groups. Early mitigation strategies, however, rested on fragile simplifications that limited their effectiveness in complex sociotechnical environments. This thesis identifies and addresses two fundamental limitations of contemporary fairness paradigms: the reliance on deterministic point estimates for auditing and the treatment of individuals as isolated entities devoid of structural context. First, the diagnosis of algorithmic unfairness has traditionally depended on scalar metrics that fail to capture the nuances of real-world deployment. This deterministic approach ignores the high statistical variance inherent in small, intersectional groups, often leading to false alarms or missed detections of bias. Furthermore, standard auditing struggles with the opacity of black-box models, frequently conflating unjustifiable bias with the influence of legitimate features.
Beyond Global Divergences: A Local-Mass Perspective on Bayesian Inference
Xu, Hanli, He, Fengxiang, Moka, Sarat
Global objectives, such as KL divergence and ELBO, are widely used in Bayesian inference for measuring distributional discrepancy. This paper studies their local-mass behaviour that is not directly captured by such objectives. We introduce and use two mathematical tools: (1) Mass Index for recording the polynomial and logarithmic decay scales of local mass, and (2) regularised extended KL (RE-KL), a set-localised divergence that can be formulated in the presence of singular components. Mass Indices help characterise how Bayesian updating changes local mass: (1) power-log likelihood factors shift it explicitly, and (2) parameter-dependent supports, or their smooth softenings, may change the local scale through the amount of mass that remains near the parameter value. Using local RE-KL, we prove absolute, relative, and directional inequalities for comparing local small-ball masses under the two KL directions. Together, these results provide a local theoretical account of local mass behaviour. Experiments provide controlled illustrations of the local behaviour. Code is available at https://github.com/Forsythia0604/Local-Mass-Framework.
Ribbon: Scalable Approximation and Robust Uncertainty Quantification
Gibson, Graham, Tipton, John, Rumsey, Kellin, Klein, Natalie
Reliably quantifying predictive uncertainty is difficult for complex, high-dimensional, or misspecified models. Both fully Bayesian and bootstrap resampling methods provide principled uncertainty estimates but are often too expensive for modern machine-learning models because they require posterior sampling or repeated model refitting. We introduce Ribbon, a scalable approximation to Dirichlet-reweighted bootstrap uncertainty. Ribbon replaces repeated refitting with an influence-function linearization around a single fitted model, preserving the first-order data-reweighting structure of the Bayesian bootstrap while requiring only post-hoc linear algebra. Ribbon approximates the Bayesian-bootstrap or weighted-likelihood-bootstrap refitting target. With a general concentration parameter, Ribbon gives a calibrated Dirichlet-reweighting family whose uncertainty scale can be tuned on validation data. We show that Ribbon is asymptotically equivalent to a flat-prior Laplace approximation under correct likelihood specification and recovers the robust sandwich covariance under misspecification. Across synthetic regression, MNIST classification, and California Housing benchmarks, Ribbon provides competitive predictive performance and improved calibration in several settings while avoiding repeated model retraining.