Goto

Collaborating Authors

 Li, Yufan


Optimal and Provable Calibration in High-Dimensional Binary Classification: Angular Calibration and Platt Scaling

arXiv.org Machine Learning

EDU Department of Statistics, Harvard University 1 Oxford Street, Cambridge, MA Abstract We study the fundamental problem of calibrating a linear binary classifier of the form ฯƒ ( ห†w x), where the feature vector x is Gaussian, ฯƒ is a link function, and ห† w is an estimator of the true linear weight w . By interpolating with a noninformative chance classifier, we construct a well-calibrated predictor whose interpolation weight depends on the angle ( ห†w,w) between the estimator ห† w and the true linear weight w . We establish that this angular calibration approach is provably well-calibrated in a high-dimensional regime where the number of samples and features both diverge, at a comparable rate. The angle ( ห†w,w) can be consistently estimated. Furthermore, the resulting predictor is uniquely Bregman-optimal, minimizing the Bregman divergence to the true label distribution within a suitable class of calibrated predictors. Our work is the first to provide a calibration strategy that satisfies both calibration and optimality properties provably in high dimensions. Additionally, we identify conditions under which a classical Platt-scaling predictor converges to our Bregman-optimal calibrated solution. Thus, Platt-scaling also inherits these desirable properties provably in high dimensions. Keywords: Calibration; Binary Classification; High Dimensions; Bregman Divergence 1. Introduction Calibration of predictive models is a fundamental problem in statistics and machine learning, especially in applications that require reliable uncertainty quantification.


ROTI-GCV: Generalized Cross-Validation for right-ROTationally Invariant Data

arXiv.org Machine Learning

Two key tasks in high-dimensional regularized regression are tuning the regularization strength for good predictions and estimating the out-of-sample risk. It is known that the standard approach -- $k$-fold cross-validation -- is inconsistent in modern high-dimensional settings. While leave-one-out and generalized cross-validation remain consistent in some high-dimensional cases, they become inconsistent when samples are dependent or contain heavy-tailed covariates. To model structured sample dependence and heavy tails, we use right-rotationally invariant covariate distributions - a crucial concept from compressed sensing. In the common modern proportional asymptotics regime where the number of features and samples grow comparably, we introduce a new framework, ROTI-GCV, for reliably performing cross-validation. Along the way, we propose new estimators for the signal-to-noise ratio and noise variance under these challenging conditions. We conduct extensive experiments that demonstrate the power of our approach and its superiority over existing methods.


Spectrum-Aware Adjustment: A New Debiasing Framework with Applications to Principal Component Regression

arXiv.org Machine Learning

We introduce a new debiasing framework for high-dimensional linear regression that bypasses the restrictions on covariate distributions imposed by modern debiasing technology. We study the prevalent setting where the number of features and samples are both large and comparable. In this context, state-of-the-art debiasing technology uses a degrees-of-freedom correction to remove the shrinkage bias of regularized estimators and conduct inference. However, this method requires that the observed samples are i.i.d., the covariates follow a mean zero Gaussian distribution, and reliable covariance matrix estimates for observed features are available. This approach struggles when (i) covariates are non-Gaussian with heavy tails or asymmetric distributions, (ii) rows of the design exhibit heterogeneity or dependencies, and (iii) reliable feature covariance estimates are lacking. To address these, we develop a new strategy where the debiasing correction is a rescaled gradient descent step (suitably initialized) with step size determined by the spectrum of the sample covariance matrix. Unlike prior work, we assume that eigenvectors of this matrix are uniform draws from the orthogonal group. We show this assumption remains valid in diverse situations where traditional debiasing fails, including designs with complex row-column dependencies, heavy tails, asymmetric properties, and latent low-rank structures. We establish asymptotic normality of our proposed estimator (centered and scaled) under various convergence notions. Moreover, we develop a consistent estimator for its asymptotic variance. Lastly, we introduce a debiased Principal Components Regression (PCR) technique using our Spectrum-Aware approach. In varied simulations and real data experiments, we observe that our method outperforms degrees-of-freedom debiasing by a margin.


Balancing Risk and Reward: An Automated Phased Release Strategy

arXiv.org Machine Learning

Phased release, also known as staged rollout, is a widely used strategy in the technology industry that involves gradually releasing a new product or update to larger audiences over time [17, 30]. For example, Apple's App Store offers a phased release option where application updates are released over a 7-day period on a fixed schedule [1]. Google Play Console provides a similar feature with more flexibility in the release schedule [16]. Typically, the audiences are randomly selected at each stage from the set of all customers, and so phased releases can be thought of as a sequence of A/B tests (or randomized experiments) in which the proportion of units assigned to the treatment group changes until either the product or update is fully launched or deprecated [26, 18, 3, 33, 6]. The process of combining phased releases with A/B tests is often called controlled rollout or iterative experiments and provides companies with an important mechanism to gather feedback on early product versions [30, 20, 5]. The key advantage of phased release is its ability to mitigate risks associated with launching a new product or update directly to all users. The potential impact of faulty features is limited by releasing the update first to a small percentage of the users (i.e., the treatment group). However, this risk-averse approach introduces an opportunity cost for slowly launching beneficial features, which quickly adds up for companies that release thousands of features yearly [34].