AITopics

In this paper we introduce a theoretical framework for semi-discrete optimization using ideas from optimal transport. Our primary motivation is in the field of deep learning, and specifically in the task of neural architecture search. With this aim in mind, we discuss the geometric and theoretical motivation for new techniques for neural architecture search (in the companion work \cite{practical}; we show that algorithms inspired by our framework are competitive with contemporaneous methods). We introduce a Riemannian like metric on the space of probability measures over a semi-discrete space $\mathbb{R}^d \times \mathcal{G}$ where $\mathcal{G}$ is a finite weighted graph. With such Riemmanian structure in hand, we derive formal expressions for the gradient flow of a relative entropy functional, as well as second order dynamics for the optimization of said energy. Then, with the aim of providing a rigorous motivation for the gradient flow equations derived formally we also consider an iterative procedure known as minimizing movement scheme (i.e., Implicit Euler scheme, or JKO scheme) and apply it to the relative entropy with respect to a suitable cost function. For some specific choices of metric and cost, we rigorously show that the minimizing movement scheme of the relative entropy functional converges to the gradient flow process provided by the formal Riemannian structure. This flow coincides with a system of reaction-diffusion equations on $\mathbb{R}^d$.

deep learning, neural network, upstream oil & gas, (21 more...)

2006.15221

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > Maryland > Prince George's County > College Park (0.14)
Europe > Sweden (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

arXiv.org Artificial IntelligenceJun-26-2020

Algorithm for Computing Approximate Nash equilibrium in Continuous Games with Application to Continuous Blotto

Ganzfried, Sam

Successful algorithms have been developed for computing Nash equilibrium in a variety of finite game classes. However, solving continuous games---in which the pure strategy space is (potentially uncountably) infinite---is far more challenging. Nonetheless, many real-world domains have continuous action spaces, e.g., where actions refer to an amount of time, money, or other resource that is naturally modeled as being real-valued as opposed to integral. We present a new algorithm for computing Nash equilibrium strategies in continuous games. In addition to two-player zero-sum games, our algorithm also applies to multiplayer games and games of imperfect information. We experiment with our algorithm on a continuous imperfect-information Blotto game, in which two players distribute resources over multiple battlefields. Blotto games have frequently been used to model national security scenarios and have also been applied to electoral competition and auction theory. Experiments show that our algorithm is able to quickly compute close approximations of Nash equilibrium strategies for this game.

algorithm, artificial intelligence, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

2006.07443

Country:

North America > United States > Texas (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Games (1.00)
Government > Military (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Li, Jiajin, So, Anthony Man-Cho, Ma, Wing-Kin

Understanding Notions of Stationarity in Non-Smooth Optimization

Many contemporary applications in signal processing and machine learning give rise to structured non-convex non-smooth optimization problems that can often be tackled by simple iterative methods quite effectively. One of the keys to understanding such a phenomenon---and, in fact, one of the very difficult conundrums even for experts---lie in the study of "stationary points" of the problem in question. Unlike smooth optimization, for which the definition of a stationary point is rather standard, there is a myriad of definitions of stationarity in non-smooth optimization. In this article, we give an introduction to different stationarity concepts for several important classes of non-convex non-smooth functions and discuss the geometric interpretations and further clarify the relationship among these different concepts. We then demonstrate the relevance of these constructions in some representative applications and how they could affect the performance of iterative methods for tackling these applications.

application, convex function, subdifferential, (13 more...)

2006.14901

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.14)
Asia > China > Hong Kong (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:

Research Report (0.50)
Overview (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Ying, Jiaxi, Cardoso, José Vinícius de M., Palomar, Daniel P.

Does the $\ell_1$-norm Learn a Sparse Graph under Laplacian Constrained Graphical Models?

We consider the problem of learning a sparse graph under Laplacian constrained Gaussian graphical models. This problem can be formulated as a penalized maximum likelihood estimation of the precision matrix under Laplacian structural constraints. Like in the classical graphical lasso problem, recent works made use of the $\ell_1$-norm regularization with the goal of promoting sparsity in Laplacian structural precision matrix estimation. However, we find that the widely used $\ell_1$-norm is not effective in imposing a sparse solution in this problem. Through empirical evidence, we observe that the number of nonzero graph weights grows with the increase of the regularization parameter. From a theoretical perspective, we prove that a large regularization parameter will surprisingly lead to a fully connected graph. To address this issue, we propose a nonconvex estimation method by solving a sequence of weighted $\ell_1$-norm penalized sub-problems and prove that the statistical error of the proposed estimator matches the minimax lower bound. To solve each sub-problem, we develop a projected gradient descent algorithm that enjoys a linear convergence rate. Numerical experiments involving synthetic and real-world data sets from the recent COVID-19 pandemic and financial stock markets demonstrate the effectiveness of the proposed method. An open source $\mathsf{R}$ package containing the code for all the experiments is available at https://github.com/mirca/sparseGraph.

graph, laplacian, lemma 5, (16 more...)

2006.14925

Country:

Asia > China > Hong Kong (0.04)
South America > Brazil (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report (0.49)

Industry:

Banking & Finance > Trading (0.66)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Kovalev, Maxim S., Utkin, Lev V.

Counterfactual explanation of machine learning survival models

A method for counterfactual explanation of machine learning survival models is proposed. One of the difficulties of solving the counterfactual explanation problem is that the classes of examples are implicitly defined through outcomes of a machine learning survival model in the form of survival functions. A condition that establishes the difference between survival functions of the original example and the counterfactual is introduced. This condition is based on using a distance between mean times to event. It is shown that the counterfactual explanation problem can be reduced to a standard convex optimization problem with linear constraints when the explained black-box model is the Cox model. For other black-box models, it is proposed to apply the well-known Particle Swarm Optimization algorithm. A lot of numerical experiments with real and synthetic data demonstrate the proposed method.

evolutionary algorithm, explanation, machine learning, (19 more...)

2006.16793

Country:

Asia > Russia (0.14)
North America > United States > New Jersey (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
(3 more...)

Covariance-engaged Classification of Sets via Linear Programming

Ren, Zhao, Jung, Sungkyu, Qiao, Xingye

Set classification aims to classify a set of observations as a whole, as opposed to classifying individual observations separately. To formally understand the unfamiliar concept of binary set classification, we first investigate the optimal decision rule under the normal distribution, which utilizes the empirical covariance of the set to be classified. We show that the number of observations in the set plays a critical role in bounding the Bayes risk. Under this framework, we further propose new methods of set classification. For the case where only a few parameters of the model drive the difference between two classes, we propose a computationally-efficient approach to parameter estimation using linear programming, leading to the Covariance-engaged LInear Programming Set (CLIPS) classifier. Its theoretical properties are investigated for both independent case and various (short-range and long-range dependent) time series structures among observations within each set. The convergence rates of estimation errors and risk of the CLIPS classifier are established to show that having multiple observations in a set leads to faster convergence rates, compared to the standard classification situation in which there is only one observation in the set. The applicable domains in which the CLIPS performs better than competitors are highlighted in a comprehensive simulation study. Finally, we illustrate the usefulness of the proposed methods in classification of real image data in histopathology.

artificial intelligence, bayesian inference, machine learning, (20 more...)

2006.14831

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > New York > Broome County > Binghamton (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.90)

França, Guilherme, Jordan, Michael I., Vidal, René

On Dissipative Symplectic Integration with Applications to Gradient-Based Optimization

arXiv.org Machine LearningJun-25-2020

Recently, continuous dynamical systems have proved useful in providing conceptual and quantitative insights into gradient-based optimization, widely used in modern machine learning and statistics. An important question that arises in this line of work is how to discretize the system in such a way that its stability and rates of convergence are preserved. In this paper we propose a geometric framework in which such discretizations can be realized systematically, enabling the derivation of "rate-matching" optimization algorithms without the need for a discrete convergence analysis. More specifically, we show that a generalization of symplectic integrators to dissipative Hamiltonian systems is able to preserve continuous rates of convergence up to a controlled error. Moreover, such methods preserve a perturbed Hamiltonian despite the absence of a conservation law, extending key results of symplectic integrators to dissipative cases. Our arguments rely on a combination of backward error analysis with fundamental results from symplectic geometry.

artificial intelligence, integrator, optimization problem, (18 more...)

2004.0684

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > Maryland > Baltimore (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Steinhoff, Vera, Kerschke, Pascal, Grimme, Christian

Empirical Study on the Benefits of Multiobjectivization for Solving Single-Objective Problems

arXiv.org Artificial IntelligenceJun-25-2020

Whether it is in the field of production, logistics, in medicine or biology; everywhere the global optimal solution or the set of global optimal solutions is sought. However, most real-world problems are of nonlinear nature and naturally multimodal which poses severe problems to global optimization. Multimodality, the existence of multiple (local) optima, is regarded as one of the biggest challenges for continuous single-objective problems [23]. A lot of algorithms get stuck searching for the global optimum or are requiring many function evaluations to escape local optima. One of the most popular strategies for dealing with multimodal problems are population-based methods like evolutionary algorithms due to their global search abilities [2]. In this paper we will examine another approach of coping with local traps, namely multiobjectivization. By transforming a single-objective into a multi-objective problem, we aim at exploiting the properties of multi-objective landscapes. So far, the characteristics of single-objective optimization problems have often been directly transferred to the multiobjective domain.

artificial intelligence, mogsa, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2006.14423

Country:

Europe > Austria > Vienna (0.14)
Europe > Germany > North Rhine-Westphalia > Münster Region > Münster (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

arXiv.org Artificial IntelligenceJun-25-2020

Optimizing AI for Teamwork

Bansal, Gagan, Nushi, Besmira, Kamar, Ece, Horvitz, Eric, Weld, Daniel S.

In many high-stakes domains such as criminal justice, finance, and healthcare, AI systems may recommend actions to a human expert responsible for final decisions, a context known as AI-advised decision making. When AI practitioners deploy the most accurate system in these domains, they implicitly assume that the system will function alone in the world. We argue that the most accurate AI team-mate is not necessarily the em best teammate; for example, predictable performance is worth a slight sacrifice in AI accuracy. So, we propose training AI systems in a human-centered manner and directly optimizing for team performance. We study this proposal for a specific type of human-AI team, where the human overseer chooses to accept the AI recommendation or solve the task themselves. To optimize the team performance we maximize the team's expected utility, expressed in terms of quality of the final decision, cost of verifying, and individual accuracies. Our experiments with linear and non-linear models on real-world, high-stakes datasets show that the improvements in utility while being small and varying across datasets and parameters (such as cost of mistake), are real and consistent with our definition of team utility. We discuss the shortcoming of current optimization approaches beyond well-studied loss functions such as log-loss, and encourage future work on human-centered optimization problems motivated by human-AI collaborations.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2004.13102

Country: North America > United States > Washington (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Abdelzad, Vahdat, Czarnecki, Krzysztof, Salay, Rick

The Effect of Optimization Methods on the Robustness of Out-of-Distribution Detection Approaches

arXiv.org Machine LearningJun-25-2020

Deep neural networks (DNNs) have become the de facto learning mechanism in different domains. Their tendency to perform unreliably on out-of-distribution (OOD) inputs hinders their adoption in critical domains. Several approaches have been proposed for detecting OOD inputs. However, existing approaches still lack robustness. In this paper, we shed light on the robustness of OOD detection (OODD) approaches by revealing the important role of optimization methods. We show that OODD approaches are sensitive to the type of optimization method used during training deep models. Optimization methods can provide different solutions to a non-convex problem and so these solutions may or may not satisfy the assumptions (e.g., distributions of deep features) made by OODD approaches. Furthermore, we propose a robustness score that takes into account the role of optimization methods. This provides a sound way to compare OODD approaches. In addition to comparing several OODD approaches using our proposed robustness score, we demonstrate that some optimization methods provide better solutions for OODD approaches.

artificial intelligence, machine learning, oodd approach, (17 more...)

2006.14584

Country: North America > United States (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)