AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization

Li, Lisha, Jamieson, Kevin, DeSalvo, Giulia, Rostamizadeh, Afshin, Talwalkar, Ameet

arXiv.org Machine LearningNov-23-2016

Performance of machine learning algorithms depends critically on identifying a good set of hyperparameters. While current methods offer efficiencies by adaptively choosing new configurations to train, an alternative strategy is to adaptively allocate resources across the selected configurations. We formulate hyperparameter optimization as a pure-exploration non-stochastic infinitely many armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations. We introduce Hyperband for this framework and analyze its theoretical properties, providing several desirable guarantees. Furthermore, we compare Hyperband with state-of-the-art methods on a suite of hyperparameter optimization problems. We observe that Hyperband provides five times to thirty times speedup over state-of-the-art Bayesian optimization algorithms on a variety of deep-learning and kernel-based learning problems.

artificial intelligence, configuration, machine learning, (20 more...)

arXiv.org Machine Learning

1603.0656

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

On Design Mining: Coevolution and Surrogate Models

Preen, Richard J., Bull, Larry

arXiv.org Artificial IntelligenceNov-23-2016

Design mining [54, 55, 56] is the use of computational intelligence techniques to iteratively search and model the attribute space of physical objects evaluated directly through rapid prototyping to meet given objectives. It enables the exploitation of novel materials and processes without formal models or complex simulation, whilst harnessing the creativity of both computational and human design methods. A sample-model-search-sample loop creates an agile/flexible approach, i.e., primarily test-driven, enabling a continuing process of prototype design consideration and criteria refinement by both producers and users. Computational intelligence techniques have long been used in design, particularly for optimisation within simulations/models. Recent developments in additive-layer manufacturing (3D printing) means that it is now possible to work with over a hundred different materials, from ceramics to cells.

artificial intelligence, evolutionary algorithm, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1162/ARTL_a_00225

1506.08781

Country:

North America > United States (1.00)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Energy > Renewable > Wind (1.00)
Materials (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Randomized Distributed Mean Estimation: Accuracy vs Communication

Konečný, Jakub, Richtárik, Peter

arXiv.org Machine LearningNov-22-2016

We consider the problem of estimating the arithmetic average of a finite collection of real vectors stored in a distributed fashion across several compute nodes subject to a communication budget constraint. Our analysis does not rely on any statistical assumptions about the source of the vectors. This problem arises as a subproblem in many applications, including reduce-all operations within algorithms for distributed and federated optimization and learning. We propose a flexible family of randomized algorithms exploring the trade-off between expected communication cost and estimation error. Our family contains the full-communication and zero-error method on one extreme, and an $\epsilon$-bit communication and ${\cal O}\left(1/(\epsilon n)\right)$ error method on the opposite extreme. In the special case where we communicate, in expectation, a single bit per coordinate of each vector, we improve upon existing results by obtaining $\mathcal{O}(r/n)$ error, where $r$ is the number of bits used to represent a floating point value.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Machine Learning

1611.07555

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Limbo: A Fast and Flexible Library for Bayesian Optimization

Cully, Antoine, Chatzilygeroudis, Konstantinos, Allocati, Federico, Mouret, Jean-Baptiste

arXiv.org Machine LearningNov-22-2016

Bayesian Optimization (BO) is designed for the most challenging ones: when the gradient is unknown, evaluating a solution is costly, and evaluations are noisy. This is, for instance, the case when we want to find optimal parameters for a machine learning algorithm [Snoek et al., 2012], because testing a set of parameters can take hours, and because of the stochastic nature of many machine learning algorithms. Besides parameter tuning, Bayesian optimization recently attracted a lot of interest for direct policy search in robot learning [Lizotte et al., 2007, Wilson et al., 2014, Calandra et al., 2016] and online adaptation; for example, it was recently used to allow a legged robot to learn a new gait after a mechanical damage in about 10-15 trials (2 minutes) [Cully et al., 2015]. At its core, Bayesian optimization builds a probabilistic model of the function to be optimized (the reward/performance/cost function) using the samples that have already been evaluated [Shahriari et al., 2016]; usually, this model is a Gaussian process [Williams and Rasmussen, 2006]. To select the next sample to be evaluated, Bayesian optimization optimizes an acquisition function which leverages the model to predict the most promising areas of the search space.

artificial intelligence, machine learning, optimization, (15 more...)

arXiv.org Machine Learning

1611.07343

Country: Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.15)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.49)

Add feedback

A Randomized Rounding Algorithm for Sparse PCA

Fountoulakis, Kimon, Kundu, Abhisek, Kontopoulou, Eugenia-Maria, Drineas, Petros

arXiv.org Machine LearningNov-22-2016

Large matrices are a common way of representing modern, massive datasets, since an m n real-valued matrix X provides a natural structure for encoding information about m objects, each of which is described by n features. Principal Components Analysis (PCA) and the Singular Value Decomposition (SVD) are fundamental data analysis tools, expressing a data matrix in terms of a sequence of orthogonal vectors of decreasing importance. While these vectors enjoy strong optimality properties and are often interpreted as fundamental latent factors that underlie the observed data, they are linear combinations of up to all the data points and features. As a result, they are notoriously difficult to interpret in terms of the underlying processes generating the data [MD09]. The seminal work of [dGJ07] introduced the concept of Sparse PCA, where sparsity constraints are enforced on the singular vectors in order to improve interpretability. As noted in [dGJ07, MD09, PDK13], an example where sparsity implies interpretability is document analysis, where sparse principal components can be mapped to specific topics by inspecting the (few) keywords in their support.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

1508.03337

Country: North America > United States > California (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.88)
Information Technology > Data Science (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

An Elementary Proof of Convex Phase Retrieval in the Natural Parameter Space via the Linear Program PhaseMax / Compressed Sensing from Phaseless Gaussian Measurements via Linear Programming in the Natural Parameter Space

#artificialintelligenceNov-21-2016, 09:15:20 GMT

The phase retrieval problem has garnered significant attention since the development of the PhaseLift algorithm, which is a convex program that operates in a lifted space of matrices. Because of the substantial computational cost due to lifting, many approaches to phase retrieval have been developed, including non-convex optimization algorithms which operate in the natural parameter space, such as Wirtinger Flow. Very recently, a convex formulation called PhaseMax has been discovered, and it has been proven to achieve phase retrieval via linear programming in the natural parameter space under optimal sample complexity. The current proofs of PhaseMax rely on statistical learning theory or geometric probability theory. Here, we present a short and elementary proof that PhaseMax exactly recovers real-valued vectors from random measurements under optimal sample complexity.

artificial intelligence, machine learning, natural parameter space, (8 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.78)

Add feedback

A new approach to Laplacian solvers and flow problems

Rebeschini, Patrick, Tatikonda, Sekhar

arXiv.org Machine LearningNov-21-2016

This paper investigates message-passing algorithms for solving systems of linear equations in the Laplacian matrices of graphs and to compute electric flows. These two problems are fundamental primitives that arise in several domains such as computer science, electrical engineering, operations research, and machine learning. Despite the extensive literature on approximately solving these problems in quasi-linear time, the algorithms that have been proposed are typically centralized and involve multiple graph theoretic constructions or sampling mechanisms that make them difficult to implement and analyze. On the other hand, message-passing routines are distributed, simple, and easy to implement. In this paper we establish a framework to analyze message-passing algorithms to solve voltage and flow problems. We characterize the error committed by the algorithms in d-regular graphs with equal weights. We show that the convergence of the algorithms is controlled by the total variation distance between the distributions of non-backtracking random walks that start from neighbor nodes. More broadly, our analysis of message-passing introduces new insights to address generic optimization problems with constraints. Laplacian solver, flow problem, message-passing, min-sum, distributed algo-Keywords: rithms.

algorithm, artificial intelligence, survey article, (18 more...)

arXiv.org Machine Learning

1611.07138

Country: North America > United States (0.46)

Genre:

Research Report (0.90)
Overview (0.68)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent

Zheng, Qinqing, Lafferty, John

arXiv.org Machine LearningNov-21-2016

A growing body of recent research is shedding new light on the role of nonconvex optimization for tackling large scale problems in machine learning, signal processing, and convex programming. This work is developing techniques that help to explain the surprising effectiveness of relatively simple first-order algorithms for certain nonconvex optimizations. When applied to problems that can be formulated as semidefinite programs, these techniques can often be viewed as part of a framework proposed by Burer and Monteiro [4]. The Burer-Monteiro technique is based on factoring the semidefinite variable, and applying classical optimization techniques to the resulting nonconvex objective over the factor. While worst-case complexity considerations imply that such an approach cannot succeed in general, a series of recent papers [11, 40, 35, 13, 1] has shown the strategy to be remarkably effective for a number of problems of practical interest, with analytical convergence guarantees and strong empirical performance. In this paper, we enlarge the collection of problems to which the Burer-Monteiro technique can be successfully applied, by analyzing the convergence properties of gradient descent applied to the problem of rectangular matrix completion from incomplete measurements.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1605.07051

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Faster variational inducing input Gaussian process classification

Izmailov, Pavel, Kropotov, Dmitry

arXiv.org Machine LearningNov-18-2016

Gaussian processes (GP) provide a prior over functions and allow finding complex regularities in data. Gaussian processes are successfully used for classification/regression problems and dimensionality reduction. In this work we consider the classification problem only. The complexity of standard methods for GP-classification scales cubically with the size of the training dataset. This complexity makes them inapplicable to big data problems. Therefore, a variety of methods were introduced to overcome this limitation. In the paper we focus on methods based on so called inducing inputs. This approach is based on variational inference and proposes a particular lower bound for marginal likelihood (evidence). This bound is then maximized w.r.t. parameters of kernel function of the Gaussian process, thus fitting the model to data. The computational complexity of this method is $O(nm^2)$, where $m$ is the number of inducing inputs used by the model and is assumed to be substantially smaller than the size of the dataset $n$. Recently, a new evidence lower bound for GP-classification problem was introduced. It allows using stochastic optimization, which makes it suitable for big data problems. However, the new lower bound depends on $O(m^2)$ variational parameter, which makes optimization challenging in case of big m. In this work we develop a new approach for training inducing input GP models for classification problems. Here we use quadratic approximation of several terms in the aforementioned evidence lower bound, obtaining analytical expressions for optimal values of most of the parameters in the optimization, thus sufficiently reducing the dimension of optimization space. In our experiments we achieve as well or better results, compared to the existing method. Moreover, our method doesn't require the user to manually set the learning rate, making it more practical, than the existing method.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

1611.06132

Country: Europe > Russia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Add feedback

Generalizing diffuse interface methods on graphs: non-smooth potentials and hypergraphs

Bosch, Jessica, Klamt, Steffen, Stoll, Martin

arXiv.org Machine LearningNov-18-2016

Diffuse interface methods have recently been introduced for the task of semi-supervised learning. The underlying model is well-known in materials science but was extended to graphs using a Ginzburg--Landau functional and the graph Laplacian. We here generalize the previously proposed model by a non-smooth potential function. Additionally, we show that the diffuse interface method can be used for the segmentation of data coming from hypergraphs. For this we show that the graph Laplacian in almost all cases is derived from hypergraph information. Additionally, we show that the formerly introduced hypergraph Laplacian coming from a relaxed optimization problem is well suited to be used within the diffuse interface method. We present computational experiments for graph and hypergraph Laplacians.

artificial intelligence, machine learning, segmentation, (17 more...)

arXiv.org Machine Learning

1611.06094

Country:

Europe (0.68)
North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.64)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback