AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

Efficiently testing local optimality and escaping saddles for ReLU networks

Yun, Chulhee, Sra, Suvrit, Jadbabaie, Ali

arXiv.org Machine LearningSep-28-2018

We provide a theoretical algorithm for checking local optimality and escaping saddles at nondifferentiable points of empirical risks of two-layer ReLU networks. Our algorithm receives any parameter value and returns: local minimum, second-order stationary point, or a strict descent direction. The presence of M data points on the nondifferentiability of the ReLU divides the parameter space into at most 2^M regions, which makes analysis difficult. By exploiting polyhedral geometry, we reduce the total computation down to one convex quadratic program (QP) for each hidden node, O(M) (in)equality tests, and one (or a few) nonconvex QP. For the last QP, we show that our specific problem can be solved efficiently, in spite of nonconvexity. In the benign case, we solve one equality constrained QP, and we prove that projected gradient descent solves it exponentially fast. In the bad case, we have to solve a few more inequality constrained QPs, but we prove that the time complexity is exponential only in the number of inequality constraints. Our experiments show that either benign case or bad case with very few inequality constraints occurs, implying that our algorithm is efficient in most cases.

artificial intelligence, extreme ray, machine learning, (15 more...)

arXiv.org Machine Learning

1809.10858

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Adaptive Gaussian process surrogates for Bayesian inference

Takhtaganov, Timur, Müller, Juliane

arXiv.org Machine LearningSep-27-2018

We present an adaptive approach to the construction of Gaussian process surrogates for Bayesian inference with expensive-to-evaluate forward models. Our method relies on the fully Bayesian approach to training Gaussian process models and utilizes the expected improvement idea from Bayesian global optimization. We adaptively construct training designs by maximizing the expected improvement in fit of the Gaussian process model to the noisy observational data. Numerical experiments on model problems with synthetic data demonstrate the effectiveness of the obtained adaptive designs compared to the fixed non-adaptive designs in terms of accurate posterior estimation at a fraction of the cost of inference with forward models.

gp model, optimization problem, upstream oil & gas, (19 more...)

arXiv.org Machine Learning

1809.10784

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Fast Stochastic Algorithms for Low-rank and Nonsmooth Matrix Problems

Garber, Dan, Kaplan, Atara

arXiv.org Machine LearningSep-27-2018

Composite convex optimization problems which include both a nonsmooth term and a low-rank promoting term have important applications in machine learning and signal processing, such as when one wishes to recover an unknown matrix that is simultaneously low-rank and sparse. However, such problems are highly challenging to solve in large-scale: the low-rank promoting term prohibits efficient implementations of proximal methods for composite optimization and even simple subgradient methods. On the other hand, methods which are tailored for low-rank optimization, such as conditional gradient-type methods, which are often applied to a smooth approximation of the nonsmooth objective, are slow since their runtime scales with both the large Lipshitz parameter of the smoothed gradient vector and with $1/\epsilon$. In this paper we develop efficient algorithms for \textit{stochastic} optimization of a strongly-convex objective which includes both a nonsmooth term and a low-rank promoting term. In particular, to the best of our knowledge, we present the first algorithm that enjoys all following critical properties for large-scale problems: i) (nearly) optimal sample complexity, ii) each iteration requires only a single \textit{low-rank} SVD computation, and iii) overall number of thin-SVD computations scales only with $\log{1/\epsilon}$ (as opposed to $\textrm{poly}(1/\epsilon)$ in previous methods). We also give an algorithm for the closely-related finite-sum setting. At the heart of our results lie a novel combination of a variance-reduction technique and the use of a \textit{weak-proximal oracle} which is key to obtaining all above three properties simultaneously.

artificial intelligence, computation, machine learning, (16 more...)

arXiv.org Machine Learning

1809.10477

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Boosting Trust Region Policy Optimization by Normalizing Flows Policy

Tang, Yunhao, Agrawal, Shipra

arXiv.org Artificial IntelligenceSep-26-2018

We propose to improve trust region policy search with normalizing flows policy. We illustrate that when the trust region is constructed by KL divergence constraint, normalizing flows policy can generate samples far from the 'center' of the previous policy iterate, which potentially enables better exploration and helps avoid bad local optima. We show that normalizing flows policy significantly improves upon factorized Gaussian policy baseline, with both TRPO and ACKTR, especially on tasks with complex dynamics such as Humanoid.

artificial intelligence, flow policy, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1809.10326

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

Chi, Yuejie, Lu, Yue M., Chen, Yuxin

arXiv.org Machine LearningSep-25-2018

Substantial progress has been made recently on developing provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. The theoretical footings, however, had been largely lacking until recently. In this tutorial-style overview, we highlight the important role of statistical models in enabling efficient nonconvex optimization with performance guarantees. We review two contrasting approaches: (1) two-stage algorithms, which consist of a tailored initialization step followed by successive refinement; and (2) global landscape analysis and initialization-free algorithms. Several canonical matrix factorization problems are discussed, including but not limited to matrix sensing, phase retrieval, matrix completion, blind deconvolution, robust principal component analysis, phase synchronization, and joint alignment. Special care is taken to illustrate the key technical insights underlying their analyses. This article serves as a testament that the integrated thinking of optimization and statistics leads to fruitful research findings.

artificial intelligence, machine learning, phase retrieval, (13 more...)

arXiv.org Machine Learning

1809.09573

Country: North America > United States (1.00)

Genre:

Overview (0.65)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)

Add feedback

Playing the Game of Universal Adversarial Perturbations

Perolat, Julien, Malinowski, Mateusz, Piot, Bilal, Pietquin, Olivier

arXiv.org Machine LearningSep-25-2018

We study the problem of learning classifiers robust to universal adversarial perturbations. While prior work approaches this problem via robust optimization, adversarial training, or input transformation, we instead phrase it as a two-player zero-sum game. In this new formulation, both players simultaneously play the same game, where one player chooses a classifier that minimizes a classification loss whilst the other player creates an adversarial perturbation that increases the same loss when applied to every sample in the training set. By observing that performing a classification (respectively creating adversarial samples) is the best response to the other player, we propose a novel extension of a game-theoretic algorithm, namely fictitious play, to the domain of training robust classifiers. Finally, we empirically show the robustness and versatility of our approach in two defence scenarios where universal attacks are performed on several image classification datasets -- CIFAR10, CIFAR100 and ImageNet.

artificial intelligence, machine learning, perturbation, (20 more...)

arXiv.org Machine Learning

1809.07802

Genre: Research Report (0.50)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)
(2 more...)

Add feedback

Provably Correct Automatic Subdifferentiation for Qualified Programs

Kakade, Sham, Lee, Jason D.

arXiv.org Machine LearningSep-23-2018

The Cheap Gradient Principle (Griewank 2008) --- the computational cost of computing the gradient of a scalar-valued function is nearly the same (often within a factor of $5$) as that of simply computing the function itself --- is of central importance in optimization; it allows us to quickly obtain (high dimensional) gradients of scalar loss functions which are subsequently used in black box gradient-based optimization procedures. The current state of affairs is markedly different with regards to computing subderivatives: widely used ML libraries, including TensorFlow and PyTorch, do not correctly compute (generalized) subderivatives even on simple examples. This work considers the question: is there a Cheap Subgradient Principle? Our main result shows that, under certain restrictions on our library of nonsmooth functions (standard in nonlinear programming), provably correct generalized subderivatives can be computed at a computational cost that is within a (dimension-free) factor of $6$ of the cost of computing the scalar function itself.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Machine Learning

1809.0853

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Generic Framework for Interesting Subspace Cluster Detection in Multi-attributed Networks

Chen, Feng, Zhou, Baojian, Alim, Adil, Zhao, Liang

arXiv.org Artificial IntelligenceSep-20-2018

Detection of interesting (e.g., coherent or anomalous) clusters has been studied extensively on plain or univariate networks, with various applications. Recently, algorithms have been extended to networks with multiple attributes for each node in the real-world. In a multi-attributed network, often, a cluster of nodes is only interesting for a subset (subspace) of attributes, and this type of clusters is called subspace clusters. However, in the current literature, few methods are capable of detecting subspace clusters, which involves concurrent feature selection and network cluster detection. These relevant methods are mostly heuristic-driven and customized for specific application scenarios. In this work, we present a generic and theoretical framework for detection of interesting subspace clusters in large multi-attributed networks. Specifically, we propose a subspace graph-structured matching pursuit algorithm, namely, SG-Pursuit, to address a broad class of such problems for different score functions (e.g., coherence or anomalous functions) and topology constraints (e.g., connected subgraphs and dense subgraphs). We prove that our algorithm 1) runs in nearly-linear time on the network size and the total number of attributes and 2) enjoys rigorous guarantees (geometrical convergence rate and tight error bound) analogous to those of the state-of-the-art algorithms for sparse feature selection problems and subgraph detection problems. As a case study, we specialize SG-Pursuit to optimize a number of well-known score functions for two typical tasks, including detection of coherent dense and anomalous connected subspace clusters in real-world networks. Empirical evidence demonstrates that our proposed generic algorithm SG-Pursuit performs superior over state-of-the-art methods that are designed specifically for these two tasks.

data mining, machine learning, subspace cluster, (17 more...)

arXiv.org Artificial Intelligence

1709.05246

Country: North America > United States (0.29)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.68)
Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Benchmarking five global optimization approaches for nano-optical shape optimization and parameter reconstruction

Schneider, Philipp-Immanuel, Santiago, Xavier Garcia, Soltwisch, Victor, Hammerschmidt, Martin, Burger, Sven, Rockstuhl, Carsten

arXiv.org Machine LearningSep-18-2018

Numerical optimization is an important tool in the field of computational physics in general and in nano-optics in specific. It has attracted attention with the increase in complexity of structures that can be realized with nowadays nano-fabrication technologies for which a rational design is no longer feasible. Also, numerical resources are available to enable the computational photonic material design and to identify structures that meet predefined optical properties for specific applications. However, the optimization objective function is in general non-convex and its computation remains resource demanding such that the right choice for the optimization method is crucial to obtain excellent results. Here, we benchmark five global optimization methods for three typical nano-optical optimization problems from the field of shape optimization and parameter reconstruction: downhill simplex optimization, the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm, particle swarm optimization, differential evolution, and Bayesian optimization. In these examples, Bayesian optimization, mainly known from machine learning applications, obtains significantly better results in a fraction of the run times of the other optimization methods.

artificial intelligence, optimization, optimization problem, (16 more...)

arXiv.org Machine Learning

1809.06674

Country: Europe > Germany (0.28)

Genre:

Research Report (0.64)
Overview (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Phase Transitions of the Typical Algorithmic Complexity of the Random Satisfiability Problem Studied with Linear Programming

Schawe, Hendrik, Bleim, Roman, Hartmann, Alexander K.

arXiv.org Artificial IntelligenceSep-18-2018

Here we study the NP-complete $K$-SAT problem. Although the worst-case complexity of NP-complete problems is conjectured to be exponential, there exist parametrized random ensembles of problems where solutions can typically be found in polynomial time for suitable ranges of the parameter. In fact, random $K$-SAT, with $\alpha=M/N $ as control parameter, can be solved quickly for small enough values of $\alpha$. It shows a phase transition between a satisfiable phase and an unsatisfiable phase. For branch and bound algorithms, which operate in the space of feasible Boolean configurations, the empirically hardest problems are located only close to this phase transition. Here we study $K$-SAT ($K=3,4$) and the related optimization problem MAX-SAT by a linear programming approach, which is widely used for practical problems and allows for polynomial run time. In contrast to branch and bound it operates outside the space of feasible configurations. On the other hand, finding a solution within polynomial time is not guaranteed. We investigated several variants like including artificial objective functions, so called cutting-plane approaches, and a mapping to the NP-complete vertex-cover problem. We observed several easy-hard transitions, from where the problems are typically solvable (in polynomial time) using the given algorithms, respectively, to where they are not solvable in polynomial time. For the related vertex-cover problem on random graphs these easy-hard transitions can be identified with structural properties of the graphs, like percolation transitions. For the present random $K$-SAT problem we have investigated numerous structural properties also exhibiting clear transitions, but they appear not be correlated to the here observed easy-hard transitions. This renders the behaviour of random $K$-SAT more complex than, e.g., the vertex-cover problem.

artificial intelligence, optimization problem, transition, (17 more...)

arXiv.org Artificial Intelligence

1702.02821

Country:

Europe > Germany (0.68)
North America > United States (0.67)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback