AITopics | Optimization

Collaborating Authors

Optimization

News Overviews Instructional Materials AI-Alerts Classics

?utm_source=dlvr.it&utm_medium=twitter

@machinelearnbotJun-6-2017, 15:20:15 GMT

In this paper, we propose Ensemble Bayesian Optimization (EBO) to overcome this problem. Unlike conventional BO methods that operate on a single posterior GP model, EBO works with an ensemble of posterior GP models. Our approach generates speedups by parallelizing the time consuming hyper-parameter posterior inference and functional evaluations on hundreds of cores and aggregating the models in every iteration of BO. We demonstrate the ability of EBO to handle sample-intensive hard optimization problems by applying it to a rover navigation problem with tens of thousands of observations.

optimization, optimization problem, social media, (6 more...)

@machinelearnbot

Industry: Information Technology > Services (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Communications > Social Media (0.40)

Add feedback

Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space

Hernández-Lobato, José Miguel, Requeima, James, Pyzer-Knapp, Edward O., Aspuru-Guzik, Alán

arXiv.org Machine LearningJun-6-2017

Chemical space is so large that brute force searches for new interesting molecules are infeasible. High-throughput virtual screening via computer cluster simulations can speed up the discovery process by collecting very large amounts of data in parallel, e.g., up to hundreds or thousands of parallel measurements. Bayesian optimization (BO) can produce additional acceleration by sequentially identifying the most useful simulations or experiments to be performed next. However, current BO methods cannot scale to the large numbers of parallel measurements and the massive libraries of molecules currently used in high-throughput screening. Here, we propose a scalable solution based on a parallel and distributed implementation of Thompson sampling (PDTS). We show that, in small scale problems, PDTS performs similarly as parallel expected improvement (EI), a batch version of the most widely used BO heuristic. Additionally, in settings where parallel EI does not scale, PDTS outperforms other scalable baselines such as a greedy search, $\epsilon$-greedy approaches and a random search method. These results show that PDTS is a successful solution for large-scale parallel BO.

artificial intelligence, machine learning, optimization problem, (14 more...)

arXiv.org Machine Learning

1706.01825

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.46)

Genre: Research Report > New Finding (0.34)

Industry:

Energy > Renewable (0.47)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Fast rates for online learning in Linearly Solvable Markov Decision Processes

Neu, Gergely, Gómez, Vicenç

arXiv.org Machine LearningJun-6-2017

We study the problem of online learning in a class of Markov decision processes known as linearly solvable MDPs. In the stationary version of this problem, a learner interacts with its environment by directly controlling the state transitions, attempting to balance a fixed state-dependent cost and a certain smooth cost penalizing extreme control inputs. In the current paper, we consider an online setting where the state costs may change arbitrarily between consecutive rounds, and the learner only observes the costs at the end of each respective round. We are interested in constructing algorithms for the learner that guarantee small regret against the best stationary control policy chosen in full knowledge of the cost sequence. Our main result is showing that the smoothness of the control cost enables the simple algorithm of following the leader to achieve a regret of order $\log^2 T$ after $T$ rounds, vastly improving on the best known regret bound of order $T^{3/4}$ for this setting.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Machine Learning

1702.06341

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Online (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.63)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration

Chen, Lijie, Gupta, Anupam, Li, Jian, Qiao, Mingda, Wang, Ruosong

arXiv.org Machine LearningJun-4-2017

We study the combinatorial pure exploration problem Best-Set in stochastic multi-armed bandits. In a Best-Set instance, we are given $n$ arms with unknown reward distributions, as well as a family $\mathcal{F}$ of feasible subsets over the arms. Our goal is to identify the feasible subset in $\mathcal{F}$ with the maximum total mean using as few samples as possible. The problem generalizes the classical best arm identification problem and the top-$k$ arm identification problem, both of which have attracted significant attention in recent years. We provide a novel instance-wise lower bound for the sample complexity of the problem, as well as a nontrivial sampling algorithm, matching the lower bound up to a factor of $\ln|\mathcal{F}|$. For an important class of combinatorial families, we also provide polynomial time implementation of the sampling algorithm, using the equivalence of separation and optimization for convex program, and approximate Pareto curves in multi-objective optimization. We also show that the $\ln|\mathcal{F}|$ factor is inevitable in general through a nontrivial lower bound construction. Our results significantly improve several previous results for several important combinatorial constraints, and provide a tighter understanding of the general Best-Set problem. We further introduce an even more general problem, formulated in geometric terms. We are given $n$ Gaussian arms with unknown means and unit variance. Consider the $n$-dimensional Euclidean space $\mathbb{R}^n$, and a collection $\mathcal{O}$ of disjoint subsets. Our goal is to determine the subset in $\mathcal{O}$ that contains the $n$-dimensional vector of the means. The problem generalizes most pure exploration bandit problems studied in the literature. We provide the first nearly optimal sample complexity upper and lower bounds for the problem.

algorithm, optimal sampling algorithm, sample complexity, (13 more...)

arXiv.org Machine Learning

1706.01081

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Experimental Design for Non-Parametric Correction of Misspecified Dynamical Models

Shulkind, Gal, Horesh, Lior, Avron, Haim

arXiv.org Machine LearningJun-4-2017

We consider a class of misspecified dynamical models where the governing term is only approximately known. Under the assumption that observations of the system's evolution are accessible for various initial conditions, our goal is to infer a non-parametric correction to the misspecified driving term such as to faithfully represent the system dynamics and devise system evolution predictions for unobserved initial conditions. We model the unknown correction term as a Gaussian Process and analyze the problem of efficient experimental design to find an optimal correction term under constraints such as a limited experimental budget. We suggest a novel formulation for experimental design for this Gaussian Process and show that approximately optimal (up to a constant factor) designs may be efficiently derived by utilizing results from the literature on submodular optimization. Our numerical experiments exemplify the effectiveness of these techniques.

artificial intelligence, experiment, machine learning, (18 more...)

arXiv.org Machine Learning

1705.00956

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Practical Coreset Constructions for Machine Learning

Bachem, Olivier, Lucic, Mario, Krause, Andreas

arXiv.org Machine LearningJun-4-2017

Over the last years, the world has witnessed the emergence of data sets of an unprecedented size across different scientific disciplines. The large volume of such data sets presents new challenges as gathering, storing, and analyzing them becomes expensive. In the context of millions or even billions of data points, existing proven algorithms "suddenly" become computationally infeasible while data sets may not fit on single machines anymore but must be stored on clusters of machines. As a consequence, new algorithms are required to scale to this massive data setting. While one could focus on single machine learning problems and come up with endless new algorithms, we focus on a more general approach: we investigate coresets -- succinct, small summaries of large data sets -- so that solutions found on the summary are provably competitive with solution found on the full data set.

artificial intelligence, coreset, machine learning, (14 more...)

arXiv.org Machine Learning

1703.06476

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Sample complexity of population recovery

Polyanskiy, Yury, Suresh, Ananda Theertha, Wu, Yihong

arXiv.org Machine LearningJun-4-2017

The problem of population recovery refers to estimating a distribution based on incomplete or corrupted samples. Consider a random poll of sample size $n$ conducted on a population of individuals, where each pollee is asked to answer $d$ binary questions. We consider one of the two polling impediments: (a) in lossy population recovery, a pollee may skip each question with probability $\epsilon$, (b) in noisy population recovery, a pollee may lie on each question with probability $\epsilon$. Given $n$ lossy or noisy samples, the goal is to estimate the probabilities of all $2^d$ binary vectors simultaneously within accuracy $\delta$ with high probability. This paper settles the sample complexity of population recovery. For lossy model, the optimal sample complexity is $\tilde\Theta(\delta^{-2\max\{\frac{\epsilon}{1-\epsilon},1\}})$, improving the state of the art by Moitra and Saks in several ways: a lower bound is established, the upper bound is improved and the result depends at most on the logarithm of the dimension. Surprisingly, the sample complexity undergoes a phase transition from parametric to nonparametric rate when $\epsilon$ exceeds $1/2$. For noisy population recovery, the sharp sample complexity turns out to be more sensitive to dimension and scales as $\exp(\Theta(d^{1/3} \log^{2/3}(1/\delta)))$ except for the trivial cases of $\epsilon=0,1/2$ or $1$. For both models, our estimators simply compute the empirical mean of a certain function, which is found by pre-solving a linear program (LP). Curiously, the dual LP can be understood as Le Cam's method for lower-bounding the minimax risk, thus establishing the statistical optimality of the proposed estimators. The value of the LP is determined by complex-analytic methods.

artificial intelligence, optimization problem, population recovery, (17 more...)

arXiv.org Machine Learning

1702.05574

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.35)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)

Add feedback

The Mixing method: coordinate descent for low-rank semidefinite programming

Wang, Po-Wei, Chang, Wei-Cheng, Kolter, J. Zico

arXiv.org Machine LearningJun-1-2017

In this paper, we propose a coordinate descent approach to low-rank structured semidefinite programming. The approach, which we call the Mixing method, is extremely simple to implement, has no free parameters, and typically attains an order of magnitude or better improvement in optimization performance over the current state of the art. We show that for certain problems, the method is strictly decreasing and guaranteed to converge to a critical point. We then apply the algorithm to three separate domains: solving the maximum cut semidefinite relaxation, solving a (novel) maximum satisfiability relaxation, and solving the GloVe word embedding optimization problem. In all settings, we demonstrate improvement over the existing state of the art along various dimensions. In total, this work substantially expands the scope and scale of problems that can be solved using semidefinite programming methods.

artificial intelligence, machine learning, optimization problem, (15 more...)

arXiv.org Machine Learning

1706.00476

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Bayesian $l_0$ Regularized Least Squares

Polson, Nicholas G., Sun, Lei

arXiv.org Machine LearningMay-31-2017

Bayesian $l_0$-regularized least squares provides a variable selection technique for high dimensional predictors. The challenge in $l_0$ regularization is optimizing a non-convex objective function via search over model space consisting of all possible predictor combinations, a NP-hard task. Spike-and-slab (a.k.a. Bernoulli-Gaussian, BG) priors are the gold standard for Bayesian variable selection, with a caveat of computational speed and scalability. We show that a Single Best Replacement (SBR) algorithm is a fast scalable alternative. Although SBR calculates a sparse posterior mode, we show that it possesses a number of equivalences and optimality properties of a posterior mean. To illustrate our methodology, we provide simulation evidence and a real data example on the statistical properties and computational efficiency of SBR versus direct posterior sampling using spike-and-slab priors. Finally, we conclude with directions for future research.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1706.00098

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Online to Offline Conversions, Universality and Adaptive Minibatch Sizes

Levy, Kfir Y.

arXiv.org Machine LearningMay-31-2017

We present an approach towards convex optimization that relies on a novel scheme which converts online adaptive algorithms into offline methods. In the offline optimization setting, our derived methods are shown to obtain favourable adaptive guarantees which depend on the harmonic sum of the queried gradients. We further show that our methods implicitly adapt to the objective's structure: in the smooth case fast convergence rates are ensured without any prior knowledge of the smoothness parameter, while still maintaining guarantees in the non-smooth setting. Our approach has a natural extension to the stochastic setting, resulting in a lazy version of SGD (stochastic GD), where minibathces are chosen \emph{adaptively} depending on the magnitude of the gradients. Thus providing a principled approach towards choosing minibatch sizes.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

1705.10499

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback