Optimization
High-Dimensional Robust Mean Estimation in Nearly-Linear Time
Cheng, Yu, Diakonikolas, Ilias, Ge, Rong
We study the fundamental problem of high-dimensional mean estimation in a robust model where a constant fraction of the samples are adversarially corrupted. Recent work gave the first polynomial time algorithms for this problem with dimension-independent error guarantees for several families of structured distributions. In this work, we give the first nearly-linear time algorithms for high-dimensional robust mean estimation. Specifically, we focus on distributions with (i) known covariance and sub-gaussian tails, and (ii) unknown bounded covariance. Given $N$ samples on $\mathbb{R}^d$, an $\epsilon$-fraction of which may be arbitrarily corrupted, our algorithms run in time $\tilde{O}(Nd) / \mathrm{poly}(\epsilon)$ and approximate the true mean within the information-theoretically optimal error, up to constant factors. Previous robust algorithms with comparable error guarantees have running times $\tilde{\Omega}(N d^2)$, for $\epsilon = \Omega(1)$. Our algorithms rely on a natural family of SDPs parameterized by our current guess $\nu$ for the unknown mean $\mu^\star$. We give a win-win analysis establishing the following: either a near-optimal solution to the primal SDP yields a good candidate for $\mu^\star$ -- independent of our current guess $\nu$ -- or the dual SDP yields a new guess $\nu'$ whose distance from $\mu^\star$ is smaller by a constant factor. We exploit the special structure of the corresponding SDPs to show that they are approximately solvable in nearly-linear time. Our approach is quite general, and we believe it can also be applied to obtain nearly-linear time algorithms for other high-dimensional robust learning problems.
Learning Multiple Defaults for Machine Learning Algorithms
Pfisterer, Florian, van Rijn, Jan N., Probst, Philipp, Mรผller, Andreas, Bischl, Bernd
The performance of modern machine learning methods highly depends on their hyperparameter configurations. One simple way of selecting a configuration is to use default settings, often proposed along with the publication and implementation of a new algorithm. Those default values are usually chosen in an ad-hoc manner to work good enough on a wide variety of datasets. To address this problem, different automatic hyperparameter configuration algorithms have been proposed, which select an optimal configuration per dataset. This principled approach usually improves performance, but adds additional algorithmic complexity and computational costs to the training procedure. As an alternative to this, we propose learning a set of complementary default values from a large database of prior empirical results. Selecting an appropriate configuration on a new dataset then requires only a simple, efficient and embarrassingly parallel search over this set. We demonstrate the effectiveness and efficiency of the approach we propose in comparison to random search and Bayesian Optimization.
Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior
Wang, Zi, Kim, Beomjoon, Kaelbling, Leslie Pack
Bayesian optimization usually assumes that a Bayesian prior is given. However, the strong theoretical guarantees in Bayesian optimization are often regrettably compromised in practice because of unknown parameters in the prior. In this paper, we adopt a variant of empirical Bayes and show that, by estimating the Gaussian process prior from offline data sampled from the same prior and constructing unbiased estimators of the posterior, variants of both GP-UCB and probability of improvement achieve a near-zero regret bound, which decreases to a constant proportional to the observational noise as the number of offline data and the number of online evaluations increase. Empirically, we have verified our approach on challenging simulated robotic problems featuring task and motion planning.
Machine Learning Models: Bias Mitigation Strategies - DZone AI
In this post, you will learn about some of the bias mitigation strategies that can be applied in ML Model Development lifecycle (MDLC) to achieve discrimination-aware Machine Learning models. The primary objective is to achieve a higher accuracy model while ensuring that the models are lesser discriminant in relation to sensitive/protected attributes. In simple words, the output of the classifier should not correlate with protected or sensitive attributes. Building such ML models becomes the multi-objective optimization problem. The quality of the classifier is measured by its accuracy and the discrimination it makes on the basis of sensitive attributes; the more accurate, the better, and the less discriminant (based on sensitive attributes), the better.
Marginal Weighted Maximum Log-likelihood for Efficient Learning of Perturb-and-Map models
Shpakova, Tatiana, Bach, Francis, Osokin, Anton
We consider the structured-output prediction problem through probabilistic approaches and generalize the "perturb-and-MAP" framework to more challenging weighted Hamming losses, which are crucial in applications. While in principle our approach is a straightforward marginalization, it requires solving many related MAP inference problems. We show that for log-supermodular pairwise models these operations can be performed efficiently using the machinery of dynamic graph cuts. We also propose to use double stochastic gradient descent, both on the data and on the perturbations, for efficient learning. Our framework can naturally take weak supervision (e.g., partial labels) into account. We conduct a set of experiments on medium-scale character recognition and image segmentation, showing the benefits of our algorithms.
Using AI to Design Stone Jewelry
Gupta, Khyatti, Damani, Sonam, Narahari, Kedhar Nath
Jewelry has been an integral part of human culture since ages. One of the most popular styles of jewelry is created by putting together precious and semi-precious stones in diverse patterns. While technology is finding its way in the production process of such jewelry, designing it remains a time-consuming and involved task. In this paper, we propose a unique approach using optimization methods coupled with machine learning techniques to generate novel stone jewelry designs at scale. Our evaluation shows that designs generated by our approach are highly likeable and visually appealing.
Sampling Can Be Faster Than Optimization
Ma, Yi-An, Chen, Yuansi, Jin, Chi, Flammarion, Nicolas, Jordan, Michael I.
Optimization algorithms and Monte Carlo sampling algorithms have provided the computational foundations for the rapid growth in applications of statistical machine learning in recent years. There is, however, limited theoretical understanding of the relationships between these two kinds of methodology, and limited understanding of relative strengths and weaknesses. Moreover, existing results have been obtained primarily in the setting of convex functions (for optimization) and log-concave functions (for sampling). In this setting, where local properties determine global properties, optimization algorithms are unsurprisingly more efficient computationally than sampling algorithms. We instead examine a class of nonconvex objective functions that arise in mixture modeling and multi-stable systems. In this nonconvex setting, we find that the computational complexity of sampling algorithms scales linearly with the model dimension while that of optimization algorithms scales exponentially.
Approximate Eigenvalue Decompositions of Linear Transformations with a Few Householder Reflectors
The ability to decompose a signal in an orthonormal basis (a set of orthogonal components, each normalized to have unit length) using a fast numerical procedure rests at the heart of many signal processing methods and applications. The classic examples are the Fourier and wavelet transforms that enjoy numerically efficient implementations (FFT and FWT, respectively). Unfortunately, orthonormal transformations are in general unstructured, and therefore they do not enjoy low computational complexity properties. In this paper, based on Householder reflectors, we introduce a class of orthonormal matrices that are numerically efficient to manipulate: we control the complexity of matrix-vector multiplications with these matrices using a given parameter. We provide numerical algorithms that approximate any orthonormal or symmetric transform with a new orthonormal or symmetric structure made up of products of a given number of Householder reflectors. We show analyses and numerical evidence to highlight the accuracy of the proposed approximations and provide an application to the case of learning fast Mahanalobis distance metric transformations.
Learning Features and Abstract Actions for Computing Generalized Plans
Bonet, Blai, Francรจs, Guillem, Geffner, Hector
Generalized planning is concerned with the computation of plans that solve not one but multiple instances of a planning domain. Recently, it has been shown that generalized plans can be expressed as mappings of feature values into actions, and that they can often be computed with fully observable non-deterministic (FOND) planners. The actions in such plans, however, are not the actions in the instances themselves, which are not necessarily common to other instances, but abstract actions that are defined on a set of common features. The formulation assumes that the features and the abstract actions are given. In this work, we address this limitation by showing how to learn them automatically. The resulting account of generalized planning combines learning and planning in a novel way: a learner, based on a Max SAT formulation, yields the features and abstract actions from sampled state transitions, and a FOND planner uses this information, suitably transformed, to produce the general plans. Correctness guarantees are given and experimental results on several domains are reported.
Efficient Online Portfolio with Logarithmic Regret
Luo, Haipeng, Wei, Chen-Yu, Zheng, Kai
We study the decades-old problem of online portfolio management and propose the first algorithm with logarithmic regret that is not based on Cover's Universal Portfolio algorithm and admits much faster implementation. Specifically Universal Portfolio enjoys optimal regret $\mathcal{O}(N\ln T)$ for $N$ financial instruments over $T$ rounds, but requires log-concave sampling and has a large polynomial running time. Our algorithm, on the other hand, ensures a slightly larger but still logarithmic regret of $\mathcal{O}(N^2(\ln T)^4)$, and is based on the well-studied Online Mirror Descent framework with a novel regularizer that can be implemented via standard optimization methods in time $\mathcal{O}(TN^{2.5})$ per round. The regret of all other existing works is either polynomial in $T$ or has a potentially unbounded factor such as the inverse of the smallest price relative.