Goto

Collaborating Authors

 Optimization


Affine-Invariant Robust Training

arXiv.org Machine Learning

The field of adversarial robustness has attracted significant attention in machine learning. Contrary to the common approach of training models that are accurate in average case, it aims at training models that are accurate for worst case inputs, hence it yields more robust and reliable models. Put differently, it tries to prevent an adversary from fooling a model. The study of adversarial robustness is largely focused on $\ell_p-$bounded adversarial perturbations, i.e. modifications of the inputs, bounded in some $\ell_p$ norm. Nevertheless, it has been shown that state-of-the-art models are also vulnerable to other more natural perturbations such as affine transformations, which were already considered in machine learning within data augmentation. This project reviews previous work in spatial robustness methods and proposes evolution strategies as zeroth order optimization algorithms to find the worst affine transforms for each input. The proposed method effectively yields robust models and allows introducing non-parametric adversarial perturbations.


Olympus: a benchmarking framework for noisy optimization and experiment planning

arXiv.org Machine Learning

Research challenges encountered across science, engineering, and economics can frequently be formulated as optimization tasks. In chemistry and materials science, recent growth in laboratory digitization and automation has sparked interest in optimization-guided autonomous discovery and closed-loop experimentation. Experiment planning strategies based on off-the-shelf optimization algorithms can be employed in fully autonomous research platforms to achieve desired experimentation goals with the minimum number of trials. However, the experiment planning strategy that is most suitable to a scientific discovery task is a priori unknown while rigorous comparisons of different strategies are highly time and resource demanding. As optimization algorithms are typically benchmarked on low-dimensional synthetic functions, it is unclear how their performance would translate to noisy, higher-dimensional experimental tasks encountered in chemistry and materials science. We introduce Olympus, a software package that provides a consistent and easy-to-use framework for benchmarking optimization algorithms against realistic experiments emulated via probabilistic deep-learning models. Olympus includes a collection of experimentally derived benchmark sets from chemistry and materials science and a suite of experiment planning strategies that can be easily accessed via a user-friendly python interface. Furthermore, Olympus facilitates the integration, testing, and sharing of custom algorithms and user-defined datasets. In brief, Olympus mitigates the barriers associated with benchmarking optimization algorithms on realistic experimental scenarios, promoting data sharing and the creation of a standard framework for evaluating the performance of experiment planning strategies


A survey of algorithmic recourse: definitions, formulations, solutions, and prospects

arXiv.org Artificial Intelligence

Machine learning is increasingly used to inform decision-making in sensitive situations where decisions have consequential effects on individuals' lives. In these settings, in addition to requiring models to be accurate and robust, socially relevant values such as fairness, privacy, accountability, and explainability play an important role for the adoption and impact of said technologies. In this work, we focus on algorithmic recourse, which is concerned with providing explanations and recommendations to individuals who are unfavourably treated by automated decision-making systems. We first perform an extensive literature review, and align the efforts of many authors by presenting unified definitions, formulations, and solutions to recourse. Then, we provide an overview of the prospective research directions towards which the community may engage, challenging existing assumptions and making explicit connections to other ethical challenges such as security, privacy, and fairness.


FetchSGD: Communication-Efficient Federated Learning with Sketching

arXiv.org Machine Learning

Existing approaches to federated learning suffer from a communication bottleneck as well as convergence issues due to sparse client participation. In this paper we introduce a novel algorithm, called FetchSGD, to overcome these challenges. FetchSGD compresses model updates using a Count Sketch, and then takes advantage of the mergeability of sketches to combine model updates from many workers. A key insight in the design of FetchSGD is that, because the Count Sketch is linear, momentum and error accumulation can both be carried out within the sketch. This allows the algorithm to move momentum and error accumulation from clients to the central aggregator, overcoming the challenges of sparse client participation while still achieving high compression rates and good convergence. We prove that FetchSGD has favorable convergence guarantees, and we demonstrate its empirical effectiveness by training two residual networks and a transformer model.


Prior-guided Bayesian Optimization

arXiv.org Machine Learning

While Bayesian Optimization (BO) is a very popular method for optimizing expensive black-box functions, it fails to leverage the experience of domain experts. This causes BO to waste function evaluations on bad design choices (e.g., machine learning hyperparameters) that the expert already knows to work poorly. To address this issue, we introduce Prior-guided Bayesian Optimization (PrBO). PrBO allows users to inject their knowledge into the optimization process in the form of priors about which parts of the input space will yield the best performance, rather than BO's standard priors over functions (which are much less intuitive for users). PrBO then combines these priors with BO's standard probabilistic model to form a pseudo-posterior used to select which points to evaluate next. We show that PrBO is around 12x faster than state-of-the-art methods without user priors and 10,000x faster than random search on a common suite of benchmarks, and achieves a new state-of-the-art performance on a real-world hardware design application. We also show that PrBO converges faster even if the user priors are not entirely accurate and that it robustly recovers from misleading priors.


Additive Tree-Structured Conditional Parameter Spaces in Bayesian Optimization: A Novel Covariance Function and a Fast Implementation

arXiv.org Machine Learning

Bayesian optimization (BO) is a sample-efficient global optimization algorithm for black-box functions which are expensive to evaluate. Existing literature on model based optimization in conditional parameter spaces are usually built on trees. In this work, we generalize the additive assumption to tree-structured functions and propose an additive tree-structured covariance function, showing improved sample-efficiency, wider applicability and greater flexibility. Furthermore, by incorporating the structure information of parameter spaces and the additive assumption in the BO loop, we develop a parallel algorithm to optimize the acquisition function and this optimization can be performed in a low dimensional space. We demonstrate our method on an optimization benchmark function, on a neural network compression problem, on pruning pre-trained VGG16 and ResNet50 models as well as on searching activation functions of ResNet20. Experimental results show our approach significantly outperforms the current state of the art for conditional parameter optimization including SMAC, TPE and Jenatton et al. (2017).


Gaussian Process Models with Low-Rank Correlation Matrices for Both Continuous and Categorical Inputs

arXiv.org Machine Learning

We introduce a method that uses low-rank approximations of cross-correlation matrices in mixed continuous and categorical Gaussian Process models. This new method -- called Low-Rank Correlation (LRC) -- offers the ability to flexibly adapt the number of parameters to the problem at hand by choosing an appropriate rank of the approximation. Furthermore, we present a systematic approach of defining test functions that can be used for assessing the accuracy of models or optimization methods that are concerned with both continuous and categorical inputs. We compare LRC to existing approaches of modeling the cross-correlation matrix. It turns out that the new approach performs well in terms of estimation of cross-correlations and response surface prediction. Therefore, LRC is a flexible and useful addition to existing methods, especially for increasing numbers of combinations of levels of the categorical inputs.


Improving Inference for Neural Image Compression

arXiv.org Machine Learning

We consider the problem of lossy image compression with deep latent variable models. State-of-the-art methods build on hierarchical variational autoencoders (VAEs) and learn inference networks to predict a compressible latent representation of each data point. Drawing on the variational inference perspective on compression, we identify three approximation gaps which limit performance in the conventional approach: (i) an amortization gap, (ii) a discretization gap, and (iii) a marginalization gap. We propose improvements to each of these three shortcomings based on ideas related to iterative inference, stochastic annealing for discrete optimization, and bits-back coding, resulting in the first application of bits-back coding to lossy compression. In our experiments, which include extensive baseline comparisons and ablation studies, we achieve new state-of-the-art performance on lossy image compression using an established VAE architecture, by changing only the inference method.


Positive semi-definite embedding for dimensionality reduction and out-of-sample extensions

arXiv.org Machine Learning

In machine learning or statistics, it is often desirable to reduce the dimensionality of a sample of data points in a high dimensional space $\mathbb{R}^d$. This paper introduces a dimensionality reduction method where the embedding coordinates are the eigenvectors of a positive semi-definite kernel obtained as the solution of an infinite dimensional analogue of a semi-definite program. This embedding is adaptive and non-linear. A main feature of our approach is the existence of a non-linear out-of-sample extension formula of the embedding coordinates, called a projected Nystr\"om approximation. This extrapolation formula yields an extension of the kernel matrix to a data-dependent Mercer kernel function. Our empirical results indicate that this embedding method is more robust with respect to the influence of outliers, compared with a spectral embedding method.


Interpretable Sequence Classification via Discrete Optimization

arXiv.org Artificial Intelligence

Sequence classification is the task of predicting a class label given a sequence of observations. In many applications such as healthcare monitoring or intrusion detection, early classification is crucial to prompt intervention. In this work, we learn sequence classifiers that favour early classification from an evolving observation trace. While many state-of-the-art sequence classifiers are neural networks, and in particular LSTMs, our classifiers take the form of finite state automata and are learned via discrete optimization. Our automata-based classifiers are interpretable---supporting explanation, counterfactual reasoning, and human-in-the-loop modification---and have strong empirical performance. Experiments over a suite of goal recognition and behaviour classification datasets show our learned automata-based classifiers to have comparable test performance to LSTM-based classifiers, with the added advantage of being interpretable.