Statistical Learning
Supplementary Material Automatic Unsupervised Outlier Model Selection
Model set Mis composed by pairing outlier detection algorithms to distinct hyperparameter choices. Table 2 provides a comprehensive description of models, including 302 unique models composed by 8 popular outlier detection (OD) algorithms. All models and parameters are based on the Python Outlier Detection Toolbox (PyOD)5. B.1 Complete List of Meta-features We summarize the meta-features used by METAOD in Table 3. When applicable, we provide the formula for computing the meta-feature(s) and corresponding variants.
Automatic Unsupervised Outlier Model Selection
Given an unsupervised outlier detection task on a new dataset, how can we automatically select a good outlier detection algorithm and its hyperparameter(s) (collectively called a model)? In this work, we tackle the unsupervised outlier model selection (UOMS) problem, and propose METAOD, a principled, data-driven approach to UOMS based on meta-learning. The UOMS problem is notoriously challenging, as compared to model selection for classification and clustering, since (i) model evaluation is infeasible due to the lack of hold-out data with labels, and (ii) model comparison is infeasible due to the lack of a universal objective function. METAOD capitalizes on the performances of a large body of detection models on historical outlier detection benchmark datasets, and carries over this prior experience to automatically select an effective model to be employed on a new dataset without any labels, model evaluations or model comparisons. To capture task similarity within our meta-learning framework, we introduce specialized metafeatures that quantify outlying characteristics of a dataset. Extensive experiments show that selecting a model by METAOD significantly outperforms no model selection (e.g.
Robust Regression Revisited: Acceleration and Improved Estimation Rates
Parameter estimation in generalized linear models, such as linear and logistic regression problems, is among the most fundamental and well-studied statistical optimization problems. It serves as the primary workhorse in statistical studies arising from a variety of disciplines, ranging from economics [Smi12], biology [VGSM05], and the social sciences [Gor10].
Robust Regression Revisited: Acceleration and Improved Estimation Rates
We study fast algorithms for statistical regression problems under the strong contamination model, where the goal is to approximately optimize a generalized linear model (GLM) given adversarially corrupted samples. Prior works in this line of research were based on the robust gradient descent framework of [PSBR20], a firstorder method using biased gradient queries, or the Sever framework of [DKK+19], an iterative outlier-removal method calling a stationary point finder. We present nearly-linear time algorithms for robust regression problems with improved runtime or estimation guarantees compared to the state-of-the-art.
On the Implicit Bias of Linear Equivariant Steerable Networks
We study the implicit bias of gradient flow on linear equivariant steerable networks in group-invariant binary classification. Our findings reveal that the parameterized predictor converges in direction to the unique group-invariant classifier with a maximum margin defined by the input group action. Under a unitary assumption on the input representation, we establish the equivalence between steerable networks and data augmentation. Furthermore, we demonstrate the improved margin and generalization bound of steerable networks over their non-invariant counterparts.
On the Implicit Bias of Linear Equivariant Steerable Networks
We study the implicit bias of gradient flow on linear equivariant steerable networks in group-invariant binary classification. Our findings reveal that the parameterized predictor converges in direction to the unique group-invariant classifier with a maximum margin defined by the input group action. Under a unitary assumption on the input representation, we establish the equivalence between steerable networks and data augmentation. Furthermore, we demonstrate the improved margin and generalization bound of steerable networks over their non-invariant counterparts.
Projection-Free Methods for Stochastic Simple Bilevel Optimization with Convex Lower-level Problem
In this paper, we study a class of stochastic bilevel optimization problems, also known as stochastic simple bilevel optimization, where we minimize a smooth stochastic objective function over the optimal solution set of another stochastic convex optimization problem. We introduce novel stochastic bilevel optimization methods that locally approximate the solution set of the lower-level problem via a stochastic cutting plane, and then run a conditional gradient update with variance reduction techniques to control the error induced by using stochastic gradients. For the case that the upper-level function is convex, our method requires O(max{1/ϵ2f,1/ϵ2g}) stochastic oracle queries to obtain a solution that is ϵfoptimal for the upper-level and ϵg-optimal for the lower-level. This guarantee improves the previous best-known complexity of O(max{1/ϵ4f,1/ϵ4g}). Moreover, for the case that the upper-level function is non-convex, our method requires at most O(max{1/ϵ3f,1/ϵ3g})stochastic oracle queries to find an (ϵf,ϵg)-stationary point. In the finite-sum setting, we show that the number of stochastic oracle calls required by our method are O( n/ϵ) and O( n/ϵ2) for the convex and non-convex settings, respectively, where ϵ = min{ϵf,ϵg}.