exchangeability
Leave a Window Out: Modifying the Jackknife for Predictive Inference in Time Series
Jiang, Hanyang, Barber, Rina Foygel, Pananjady, Ashwin, Xie, Yao
Conformal prediction methods enjoy strong theoretical and empirical predictive inference performance, provided the data is exchangeable, and predictors are trained in a memoryless fashion. However, these assumptions and constraints are impractical in many real-data settings, such as time series (where temporal dependence violates exchangeability, and where memoryless predictors will inevitably have poor predictive accuracy). Recent work shows that the split conformal prediction method is robust to these issues of memory-based predictors and deviations from exchangeability that are common features of time-series data. However, since using sample splitting can lead to lower accuracy, this motivates asking whether other predictive inference methods (that do not rely on data splitting) could also be reliably used in the time series setting. In this work, we show that the vanilla leave-one-out jackknife can suffer an arbitrary loss of coverage even in canonical time series models with mild temporal dependence. As a remedy, we propose a careful modification tailored to such settings, which we term the \emph{leave-a-window-out} (LWO) method, and show that it can achieve valid coverage provided that the model-fitting procedure satisfies mild stability properties. Our proofs are based on quantifying the degree to which the data departs from \emph{cyclic exchangeability}, and we introduce new coefficients to measure the extent of this departure. Experiments on time series data demonstrate that our LWO method often enjoys valid coverage when the vanilla jackknife fails to cover, while producing much narrower intervals than split conformal prediction.
Structure-Adaptive Conformal Inference for Large-Scale Out-of-Distribution Testing
Sun, Rongyi, Sun, Wenguang, Zhao, Zinan
This paper addresses structured out-of-distribution (OOD) testing in high-stakes machine learning applications. Traditional conformal methods rely on joint exchangeability, making it difficult to incorporate auxiliary information such as spatiotemporal or grouping structures. To overcome this limitation, we propose the structure-adaptive conformal q-value (SCQ), a significance index that integrates individual test evidence with structural patterns. We also develop pseudo-score-guided transductive automated model selection (P-TAMS), which adapts conformalized model selection to structured OOD testing across a toolbox of candidate models. Together, SCQ and P-TAMS form a unified framework under pairwise exchangeability, providing finite-sample error-rate control, improved power, and enhanced interpretability. Experiments on simulated and real data demonstrate that the proposed approach controls the false discovery rate and performs well across diverse settings.
Conditional Predictive Inference for General Structured Data with Group Symmetries
We study distribution-free predictive inference for data with group symmetries, aiming to establish near-conditional coverage guarantees beyond exchangeability for structured data. While many predictive inference methods achieve a target coverage level, most provide marginal coverage. In practice, conditional predictive inference is often preferred, as it quantifies uncertainty for black-box predictions given observed attributes, thereby accommodating heterogeneity. Although many efforts have pursued efficient conditional coverage, existing methods rely on the i.i.d. or exchangeable assumption, often violated in structured settings such as networks, clusters, and imaging data. Recently, SymmPI introduced a unified approach to predictive inference under group symmetries beyond exchangeability; nevertheless, its guarantees remain marginal and do not account for population heterogeneity. To bridge this gap, we introduce C-SymmPI, a framework that achieves near-conditional coverage under general data structures with group symmetries, extending beyond exchangeability to cover networks, cluster-level data, and related structures. Inspired by relaxed multi-accuracy, our approach reformulates conditional coverage as miscoverage error over a user-specified function class. We establish theoretical guarantees under distributional invariance and distribution shift, and derive convergence rates for linear and RKHS function classes, recovering state-of-the-art results in the exchangeable setting as special cases. For computational efficiency, we develop two variants: a projection-based algorithm for high-dimensional observations, and a sampling-based algorithm for large or infinite groups. We demonstrate effectiveness on hierarchical and network data. Empirical results show that C-SymmPI delivers more informative and stable conditional coverage with improved accuracy compared to existing methods.
CRPS-Optimal Binning for Univariate Conformal Regression
We propose a method for non-parametric conditional distribution estimation based on partitioning covariate-sorted observations into contiguous bins and using the within-bin empirical CDF as the predictive distribution. Bin boundaries are chosen to minimise the total leave-one-out Continuous Ranked Probability Score (LOO-CRPS), which admits a closed-form cost function with $O(n^2 \log n)$ precomputation and $O(n^2)$ storage; the globally optimal $K$-partition is recovered by a dynamic programme in $O(n^2 K)$ time. Minimisation of within-sample LOO-CRPS turns out to be inappropriate for selecting $K$ as it results in in-sample optimism. We instead select $K$ by $K$-fold cross-validation of test CRPS, which yields a U-shaped criterion with a well-defined minimum. Having selected $K^*$ and fitted the full-data partition, we form two complementary predictive objects: the Venn prediction band and a conformal prediction set based on CRPS as the nonconformity score, which carries a finite-sample marginal coverage guarantee at any prescribed level $\varepsilon$. The conformal prediction is transductive and data-efficient, as all observations are used for both partitioning and p-value calculation, with no need to reserve a hold-out set. On real benchmarks against split-conformal competitors (Gaussian split conformal, CQR, CQR-QRF, and conformalized isotonic distributional regression), the method produces substantially narrower prediction intervals while maintaining near-nominal coverage.
Weighted Bayesian Conformal Prediction
Conformal prediction provides distribution-free prediction intervals with finite-sample coverage guarantees, and recent work by Snell \& Griffiths reframes it as Bayesian Quadrature (BQ-CP), yielding powerful data-conditional guarantees via Dirichlet posteriors over thresholds. However, BQ-CP fundamentally requires the i.i.d. assumption -- a limitation the authors themselves identify. Meanwhile, weighted conformal prediction handles distribution shift via importance weights but remains frequentist, producing only point-estimate thresholds. We propose \textbf{Weighted Bayesian Conformal Prediction (WBCP)}, which generalizes BQ-CP to arbitrary importance-weighted settings by replacing the uniform Dirichlet $\Dir(1,\ldots,1)$ with a weighted Dirichlet $\Dir(\neff \cdot \tilde{w}_1, \ldots, \neff \cdot \tilde{w}_n)$, where $\neff$ is Kish's effective sample size. We prove four theoretical results: (1)~$\neff$ is the unique concentration parameter matching frequentist and Bayesian variances; (2)~posterior standard deviation decays as $O(1/\sqrt{\neff})$; (3)~BQ-CP's stochastic dominance guarantee extends to per-weight-profile data-conditional guarantees; (4)~the HPD threshold provides $O(1/\sqrt{\neff})$ improvement in conditional coverage. We instantiate WBCP for spatial prediction as \emph{Geographical BQ-CP}, where kernel-based spatial weights yield per-location posteriors with interpretable diagnostics. Experiments on synthetic and real-world spatial datasets demonstrate that WBCP maintains coverage guarantees while providing substantially richer uncertainty information.
Edge-exchangeable graphs and sparsity
Many popular network models rely on the assumption of (vertex) exchangeability, in which the distribution of the graph is invariant to relabelings of the vertices. However, the Aldous-Hoover theorem guarantees that these graphs are dense or empty with probability one, whereas many real-world graphs are sparse. We present an alternative notion of exchangeability for random graphs, which we call edge exchangeability, in which the distribution of a graph sequence is invariant to the order of the edges. We demonstrate that edge-exchangeable models, unlike models that are traditionally vertex exchangeable, can exhibit sparsity. To do so, we outline a general framework for graph generative models; by contrast to the pioneering work of Caron and Fox (2015), models within our framework are stationary across steps of the graph sequence. In particular, our model grows the graph by instantiating more latent atoms of a single random measure as the dataset size increases, rather than adding new atoms to the measure.