Goto

Collaborating Authors

 Search


PRIM-cipal components analysis

arXiv.org Machine Learning

EVEN supervised learning is subject to the famous NoFree Lunch Theorems [1]-[3], which say that, in combinatorial optimization, there is no universal algorithm that works better than its competitors for every objective function [4]-[6]. Indeed, David Wolpert has recently proven that, on average, cross-validation performs as well as anti-crossvalidation (choosing among a set of candidate algorithms based on which has the worst out-of-sample behavior) for supervised learning. Still, he acknowledges that "it is hard to imagine any scientist who would not prefer to use [crossvalidation] to using anti-cross-validation" [7]. On the other hand, unsupervised learning has seldom been studied from the perspective of the NFLTs. This may be because the adjective "unsupervised" suggests that no human input is needed, which is misleading as many unsupervised tasks are combinatorial optimization problems that depend on the choice of the objective function. For instance, it is well known that, among the eigenvectors of the covariance matrix, Principal Components Analysis selects those with the largest variances [8]. However, mode-hunting techniques that rely on spectral manipulation aim at the opposite objective: selecting the eigenvectors of the covariance matrix with the smallest variances [9], [10]. Therefore, unlike in supervised learning, where it is difficult to identify reasons to optimize with respect to anti-cross-validation, in unsupervised learning there are strong reasons to reduce dimensionality for variance minimization. D. A. D ıaz-Pach on and T. Liu are with the Division of Biostatistics, University of Miami, Miami, FL, 33136 USA (e-mail: ddiaz3@miami.edu,


Order-Optimal Sequential 1-Bit Mean Estimation in General Tail Regimes

arXiv.org Machine Learning

In this paper, we study the problem of mean estimation under strict 1-bit communication constraints. We propose a novel adaptive mean estimator based solely on randomized threshold queries, where each 1-bit outcome indicates whether a given sample exceeds a sequentially chosen threshold. Our estimator is $(ε, δ)$-PAC for any distribution with a bounded mean $μ\in [-λ, λ]$ and a bounded $k$-th central moment $\mathbb{E}[|X-μ|^k] \le σ^k$ for any fixed $k > 1$. Crucially, our sample complexity is order-optimal in all such tail regimes, i.e., for every such $k$ value. For $k \neq 2$, our estimator's sample complexity matches the unquantized minimax lower bounds plus an unavoidable $O(\log(λ/σ))$ localization cost. For the finite-variance case ($k=2$), our estimator's sample complexity has an extra multiplicative $O(\log(σ/ε))$ penalty, and we establish a novel information-theoretic lower bound showing that this penalty is a fundamental limit of 1-bit quantization. We also establish a significant adaptivity gap: for both threshold queries and more general interval queries, the sample complexity of any non-adaptive estimator must scale linearly with the search space parameter $λ/σ$, rendering it vastly less sample efficient than our adaptive approach. Finally, we present algorithmic variants that (i) handle an unknown sampling budget, (ii) adapt to an unknown scale parameter~$σ$ given (possibly loose) bounds, and (iii) require only two stages of adaptivity at the expense of more complicated general 1-bit queries.


Efficient machine unlearning with minimax optimality

arXiv.org Machine Learning

There is a growing demand for efficient data removal to comply with regulations like the GDPR and to mitigate the influence of biased or corrupted data. This has motivated the field of machine unlearning, which aims to eliminate the influence of specific data subsets without the cost of full retraining. In this work, we propose a statistical framework for machine unlearning with generic loss functions and establish theoretical guarantees. For squared loss, especially, we develop Unlearning Least Squares (ULS) and establish its minimax optimality for estimating the model parameter of remaining data when only the pre-trained estimator, forget samples, and a small subsample of the remaining data are available. Our results reveal that the estimation error decomposes into an oracle term and an unlearning cost determined by the forget proportion and the forget model bias. We further establish asymptotically valid inference procedures without requiring full retraining. Numerical experiments and real-data applications demonstrate that the proposed method achieves performance close to retraining while requiring substantially less data access.


Shape-Adaptive Conditional Calibration for Conformal Prediction via Minimax Optimization

arXiv.org Machine Learning

Achieving valid conditional coverage in conformal prediction is challenging due to the theoretical difficulty of satisfying pointwise constraints in finite samples. Building upon the characterization of conditional coverage through marginal moment restrictions, we introduce Minimax Optimization Predictive Inference (MOPI), a framework that generalizes prior work by optimizing over a flexible class of set-valued mappings during the calibration phase, rather than simply calibrating a fixed sublevel set. This minimax formulation effectively circumvents the structural constraints of predefined score functions, achieving superior shape adaptivity while maintaining a principled connection to the minimization of mean squared coverage error. Theoretically, we provide non-asymptotic oracle inequalities and show that the convergence rate of the coverage error attains the optimal order under regular conditions. The MOPI also enables valid inference conditional on sensitive attributes that are available during calibration but unobserved at test time. Empirical results on complex, non-standard conditional distributions demonstrate that MOPI produces more efficient prediction sets than existing baselines.





Total Variation Classes Beyond 1d: Minimax Rates, and the Limitations of Linear Smoothers

Neural Information Processing Systems

We consider the problem of estimating a function defined over nlocations on a d-dimensional grid (having all side lengths equal to n1/d). When the function is constrained to have discrete total variation bounded by Cn, we derive the minimax optimal (squared) `2 estimation error rate, parametrized by n,Cn. Total variation denoising, also known as the fused lasso, is seen to be rate optimal. Several simpler estimators exist, such as Laplacian smoothing and Laplacian eigenmaps. A natural question is: can these simpler estimators perform just as well?



Linear Time Approximation Algorithm for Column Subset Selection with Local Search

Neural Information Processing Systems

The Column Subset Selection (CSS) problem has been widely studied in dimensionality reduction and feature selection. The goal of the CSS problem is to output a submatrix S, consisting of k columns from an n d input matrix A that minimizes the residual error A-SS^\dagger A _F^2, where S^\dagger is the Moore-Penrose inverse matrix of S. Many previous approximation algorithms have non-linear running times in both n and d, while the existing linear-time algorithms have a relatively larger approximation ratios. Additionally, the local search algorithms in existing results for solving the CSS problem are heuristic. To achieve linear running time while maintaining better approximation using a local search strategy, we propose a local search-based approximation algorithm for the CSS problem with exactly k columns selected.