Goto

Collaborating Authors

 Optimization


Efficient Algorithms for Smooth Minimax Optimization

arXiv.org Machine Learning

This paper studies first order methods for solving smooth minimax optimization problems $\min_x \max_y g(x,y)$ where $g(\cdot,\cdot)$ is smooth and $g(x,\cdot)$ is concave for each $x$. In terms of $g(\cdot,y)$, we consider two settings -- strongly convex and nonconvex -- and improve upon the best known rates in both. For strongly-convex $g(\cdot, y),\ \forall y$, we propose a new algorithm combining Mirror-Prox and Nesterov's AGD, and show that it can find global optimum in $\tilde{O}(1/k^2)$ iterations, improving over current state-of-the-art rate of $O(1/k)$. We use this result along with an inexact proximal point method to provide $\tilde{O}(1/k^{1/3})$ rate for finding stationary points in the nonconvex setting where $g(\cdot, y)$ can be nonconvex. This improves over current best-known rate of $O(1/k^{1/5})$. Finally, we instantiate our result for finite nonconvex minimax problems, i.e., $\min_x \max_{1\leq i\leq m} f_i(x)$, with nonconvex $f_i(\cdot)$, to obtain convergence rate of $O(m(\log m)^{3/2}/k^{1/3})$ total gradient evaluations for finding a stationary point.


Generative Models for Automatic Chemical Design

arXiv.org Machine Learning

Materials discovery is decisive for tackling urgent challenges related to energy, the environment, health care and many others. In chemistry, conventional methodologies for innovation usually rely on expensive and incremental strategies to optimize properties from molecular structures. On the other hand, inverse approaches map properties to structures, thus expediting the design of novel useful compounds. In this chapter, we examine the way in which current deep generative models are addressing the inverse chemical discovery paradigm. We begin by revisiting early inverse design algorithms. Then, we introduce generative models for molecular systems and categorize them according to their architecture and molecular representation. Using this classification, we review the evolution and performance of important molecular generation schemes reported in the literature. Finally, we conclude highlighting the prospects and challenges of generative models as cutting edge tools in materials discovery.


Mixed-Variable Bayesian Optimization

arXiv.org Machine Learning

The optimization of expensive to evaluate, black-box, mixed-variable functions, i.e. functions that have continuous and discrete inputs, is a difficult and yet pervasive problem in science and engineering. In Bayesian optimization (BO), special cases of this problem that consider fully continuous or fully discrete domains have been widely studied. However, few methods exist for mixed-variable domains. In this paper, we introduce MiVaBo, a novel BO algorithm for the efficient optimization of mixed-variable functions that combines a linear surrogate model based on expressive feature representations with Thompson sampling. We propose two methods to optimize its acquisition function, a challenging problem for mixed-variable domains, and we show that MiVaBo can handle complex constraints over the discrete part of the domain that other methods cannot take into account. Moreover, we provide the first convergence analysis of a mixed-variable BO algorithm. Finally, we show that MiVaBo is significantly more sample efficient than state-of-the-art mixed-variable BO algorithms on hyperparameter tuning tasks.


Two-stage Optimization for Machine Learning Workflow

arXiv.org Artificial Intelligence

Machines learning techniques plays a preponderant role in dealing with massive amount of data and are employed in almost every possible domain. Building a high quality machine learning model to be deployed in production is a challenging task, from both, the subject matter experts and the machine learning practitioners. For a broader adoption and scalability of machine learning systems, the construction and configuration of machine learning workflow need to gain in automation. In the last few years, several techniques have been developed in this direction, known as autoML. In this paper, we present a two-stage optimization process to build data pipelines and configure machine learning algorithms. First, we study the impact of data pipelines compared to algorithm configuration in order to show the importance of data preprocessing over hyperparameter tuning. The second part presents policies to efficiently allocate search time between data pipeline construction and algorithm configuration. Those policies are agnostic from the metaoptimizer. Last, we present a metric to determine if a data pipeline is specific or independent from the algorithm, enabling fine-grain pipeline pruning and meta-learning for the coldstart problem.


The Cost of a Reductions Approach to Private Fair Optimization

arXiv.org Machine Learning

We examine a reductions approach to fair optimization and learning where a black-box optimizer is used to learn a fair model for classification or regression [Alabi et al., 2018, Agarwal et al., 2018] and explore the creation of such fair models that adhere to data privacy guarantees (specifically differential privacy). For this approach, we consider two suites of use cases: the first is for optimizing convex performance measures of the confusion matrix (such as $G$-mean and $H$-mean); the second is for satisfying statistical definitions of algorithmic fairness (such as equalized odds, demographic parity, and the gini index of inequality). The reductions approach to fair optimization can be abstracted as the constrained group-objective optimization problem where we aim to optimize an objective that is a function of losses of individual groups, subject to some constraints. We present two differentially private algorithms: an $(\epsilon, 0)$ exponential sampling algorithm and an $(\epsilon, \delta)$ algorithm that uses a linear optimizer to incrementally move toward the best decision. We analyze the privacy and utility guarantees of these empirical risk minimization algorithms. Compared to a previous method for ensuring differential privacy subject to a relaxed form of the equalized odds fairness constraint, the $(\epsilon, \delta)$ differentially private algorithm we present provides asymptotically better sample complexity guarantees. The technique of using an approximate linear optimizer oracle to achieve privacy might be applicable to other problems not considered in this paper. Finally, we show an algorithm-agnostic lower bound on the accuracy of any solution to the problem of $(\epsilon, 0)$ or $(\epsilon, \delta)$ private constrained group-objective optimization.


The Sensitivity of Counterfactual Fairness to Unmeasured Confounding

arXiv.org Machine Learning

Causal approaches to fairness have seen substantial recent interest, both from the machine learning community and from wider parties interested in ethical prediction algorithms. In no small part, this has been due to the fact that causal models allow one to simultaneously leverage data and expert knowledge to remove discriminatory effects from predictions. However, one of the primary assumptions in causal modeling is that you know the causal graph. This introduces a new opportunity for bias, caused by misspecifying the causal model. One common way for misspecification to occur is via unmeasured confounding: the true causal effect between variables is partially described by unobserved quantities. In this work we design tools to assess the sensitivity of fairness measures to this confounding for the popular class of non-linear additive noise models (ANMs). Specifically, we give a procedure for computing the maximum difference between two counterfactually fair predictors, where one has become biased due to confounding. For the case of bivariate confounding our technique can be swiftly computed via a sequence of closed-form updates. For multivariate confounding we give an algorithm that can be efficiently solved via automatic differentiation. We demonstrate our new sensitivity analysis tools in real-world fairness scenarios to assess the bias arising from confounding.


Forecasting high-dimensional dynamics exploiting suboptimal embeddings

arXiv.org Machine Learning

Delay embedding---a method for reconstructing dynamical systems by delay coordinates---is widely used to forecast nonlinear time series as a model-free approach. When multivariate time series are observed, several existing frameworks can be applied to yield a single forecast combining multiple forecasts derived from various embeddings. However, the performance of these frameworks is not always satisfactory because they randomly select embeddings or use brute force and do not consider the diversity of the embeddings to combine. Herein, we develop a forecasting framework that overcomes these existing problems. The framework exploits various "suboptimal embeddings" obtained by minimizing the in-sample error via combinatorial optimization. The framework achieves the best results among existing frameworks for sample toy datasets and a real-world flood dataset. We show that the framework is applicable to a wide range of data lengths and dimensions. Therefore, the framework can be applied to various fields such as neuroscience, ecology, finance, fluid dynamics, weather, and disaster prevention.


Approximate Sherali-Adams Relaxations for MAP Inference via Entropy Regularization

arXiv.org Machine Learning

Maximum a posteriori (MAP) inference is a fundamental computational paradigm for statistical inference. In the setting of graphical models, MAP inference entails solving a combinatorial optimization problem to find the most likely configuration of the discrete-valued model. Linear programming (LP) relaxations in the Sherali-Adams hierarchy are widely used to attempt to solve this problem. We leverage recent work in entropy-regularized linear programming to propose an iterative projection algorithm (SMPLP) for large scale MAP inference that is guaranteed to converge to a near-optimal solution to the relaxation. With an appropriately chosen regularization constant, we show the resulting rounded solution solves the exact MAP problem whenever the LP is tight. We further provide theoretical guarantees on the number of iterations sufficient to achieve $\epsilon$-close solutions. Finally, we show in practice that SMPLP is competitive for solving Sherali-Adams relaxations.


The Constrained $L_p$-$L_q$ Basis Pursuit Denoising Problem

arXiv.org Machine Learning

In this paper, we consider the constrained $L_p$-$L_q$ basis pursuit denoising problem, which aims to find a minimizer of $\|\bf{x}\|_p^p$ subject to $\|A\bf{x}-\bf{b}\|_q\leq\sigma$ for given $A \in \mathbb{R}^{m \times n}$, $\bf{b}\in\mathbb{R}^m$, $\sigma \geq0$, $0\leq p\leq1$ and $q \geq 1$. We first study the properties of the optimal solutions of this problem. Specifically, without any condition on the matrix $A$, we provide upper bounds in cardinality and infinity norm for the optimal solutions, and show that all optimal solutions must be on the boundary of the feasible set when $0


Online Continuous DR-Submodular Maximization with Long-Term Budget Constraints

arXiv.org Machine Learning

In this paper, we study a class of online optimization problems with long-term budget constraints where the objective functions are not necessarily concave (nor convex) but they instead satisfy the Diminishing Returns (DR) property. Specifically, a sequence of monotone DR-submodular objective functions $\{f_t(x)\}_{t=1}^T$ and monotone linear budget functions $\{\langle p_t,x \rangle \}_{t=1}^T$ arrive over time and assuming a total targeted budget $B_T$, the goal is to choose points $x_t$ at each time $t\in\{1,\dots,T\}$, without knowing $f_t$ and $p_t$ on that step, to achieve sub-linear regret bound while the total budget violation $\sum_{t=1}^T \langle p_t,x_t \rangle -B_T$ is sub-linear as well. Prior work has shown that achieving sub-linear regret is impossible if the budget functions are chosen adversarially. Therefore, we modify the notion of regret by comparing the agent against a $(1-\frac{1}{e})$-approximation to the best fixed decision in hindsight which satisfies the budget constraint proportionally over any window of length $W$. We propose the Online Saddle Point Hybrid Gradient (OSPHG) algorithm to solve this class of online problems. For $W=T$, we recover the aforementioned impossibility result. However, when $W=o(T)$, we show that it is possible to obtain sub-linear bounds for both the $(1-\frac{1}{e})$-regret and the total budget violation.