Optimization
Static & DYNAMICAL Machine Learning – What is the Difference?
In an earlier blog, "Need for DYNAMICAL Machine Learning: Bayesian exact recursive estimation", I introduced the need for Dynamical ML as we now enter the "Walk" stage of "Crawl-Walk-Run" evolution of machine learning. First, I defined Static ML as follows: Given a set of inputs and outputs, find a static map between the two during supervised "Training" and use this static map for business purposes during "Operation". I made the following points using IoT as an example. Dynamical ML solution involves State-Space data model (more below). What more does a Dynamical ML solution offer?
Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation
Bogunovic, Ilija, Scarlett, Jonathan, Krause, Andreas, Cevher, Volkan
We present a new algorithm, truncated variance reduction (TruVaR), that treats Bayesian optimization (BO) and level-set estimation (LSE) with Gaussian processes in a unified fashion. The algorithm greedily shrinks a sum of truncated variances within a set of potential maximizers (BO) or unclassified points (LSE), which is updated based on confidence bounds. TruVaR is effective in several important settings that are typically non-trivial to incorporate into myopic algorithms, including pointwise costs and heteroscedastic noise. We provide a general theoretical guarantee for TruVaR covering these aspects, and use it to recover and strengthen existing results on BO and LSE. Moreover, we provide a new result for a setting where one can select from a number of noise levels having associated costs. We demonstrate the effectiveness of the algorithm on both synthetic and real-world data sets.
A global optimization algorithm for sparse mixed membership matrix factorization
Zhang, Fan, Wang, Chuangqi, Trapp, Andrew, Flaherty, Patrick
Mixed membership factorization is a popular approach for analyzing data sets that have within-sample heterogeneity. In recent years, several algorithms have been developed for mixed membership matrix factorization, but they only guarantee estimates from a local optimum. Here, we derive a global optimization (GOP) algorithm that provides a guaranteed $\epsilon$-global optimum for a sparse mixed membership matrix factorization problem. We test the algorithm on simulated data and find the algorithm always bounds the global optimum across random initializations and explores multiple modes efficiently.
Markov Chain methods for the bipartite Boolean quadratic programming problem
Karapetyan, Daniel, Punnen, Abraham P., Parkes, Andrew J.
We study the Bipartite Boolean Quadratic Programming Problem (BBQP) which is an extension of the well known Boolean Quadratic Programming Problem (BQP). Applications of the BBQP include mining discrete patterns from binary data, approximating matrices by rank-one binary matrices, computing the cut-norm of a matrix, and solving optimisation problems such as maximum weight biclique, bipartite maximum weight cut, maximum weight induced subgraph of a bipartite graph, etc. For the BBQP, we first present several algorithmic components, specifically, hill climbers and mutations, and then show how to combine them in a high-performance metaheuristic. Instead of hand-tuning a standard metaheuristic to test the efficiency of the hybrid of the components, we chose to use an automated generation of a multi-component metaheuristic to save human time, and also improve objectivity in the analysis and comparisons of components. For this we designed a new metaheuristic schema which we call Conditional Markov Chain Search (CMCS). We show that CMCS is flexible enough to model several standard metaheuristics; this flexibility is controlled by multiple numeric parameters, and so is convenient for automated generation. We study the configurations revealed by our approach and show that the best of them outperforms the previous state-of-the-art BBQP algorithm by several orders of magnitude. In our experiments we use benchmark instances introduced in the preliminary version of this paper and described here, which have already become the de facto standard in the BBQP literature. Keywords: artificial intelligence, bipartite Boolean quadratic programming, automated heuristic configuration, benchmark 1. Introduction The (Unconstrained) Boolean Quadratic Programming Problem (BQP) is to maximise f(x) x The BQP is a well-studied problem in the operational research literature [6]. The focus of this paper is on a problem closely related to BQP, called the Bipartite (Unconstrained) Boolean Quadratic Programming Problem (BBQP) [23]. A graph theoretic interpretation of the BBQP can be given as follows [23]. Consider a bipartite graph G (I, J, E). M otherwise, where M is a large positive constant. Then BBQP(Q, c, d) solves the MWBP [23].
Focused Model-Learning and Planning for Non-Gaussian Continuous State-Action Systems
Wang, Zi, Jegelka, Stefanie, Kaelbling, Leslie Pack, Lozano-Pérez, Tomás
We introduce a framework for model learning and planning in stochastic domains with continuous state and action spaces and non-Gaussian transition models. It is efficient because (1) local models are estimated only when the planner requires them; (2) the planner focuses on the most relevant states to the current planning problem; and (3) the planner focuses on the most informative and/or high-value actions. Our theoretical analysis shows the validity and asymptotic optimality of the proposed approach. Empirically, we demonstrate the effectiveness of our algorithm on a simulated multi-modal pushing problem.
A Multi-Batch L-BFGS Method for Machine Learning
Berahas, Albert S., Nocedal, Jorge, Takáč, Martin
The question of how to parallelize the stochastic gradient descent (SGD) method has received much attention in the literature. In this paper, we focus instead on batch methods that use a sizeable fraction of the training set at each iteration to facilitate parallelism, and that employ second-order information. In order to improve the learning process, we follow a multi-batch approach in which the batch changes at each iteration. This can cause difficulties because L-BFGS employs gradient differences to update the Hessian approximations, and when these gradients are computed using different data points the process can be unstable. This paper shows how to perform stable quasi-Newton updating in the multi-batch setting, illustrates the behavior of the algorithm in a distributed computing platform, and studies its convergence properties for both the convex and nonconvex cases.
Independent Component Analysis by Entropy Maximization with Kernels
Boukouvalas, Zois, Mowakeaa, Rami, Fu, Geng-Shen, Adali, Tulay
Independent component analysis (ICA) is the most popular method for blind source separation (BSS) with a diverse set of applications, such as biomedical signal processing, video and image analysis, and communications. Maximum likelihood (ML), an optimal theoretical framework for ICA, requires knowledge of the true underlying probability density function (PDF) of the latent sources, which, in many applications, is unknown. ICA algorithms cast in the ML framework often deviate from its theoretical optimality properties due to poor estimation of the source PDF. Therefore, accurate estimation of source PDFs is critical in order to avoid model mismatch and poor ICA performance. In this paper, we propose a new and efficient ICA algorithm based on entropy maximization with kernels, (ICA-EMK), which uses both global and local measuring functions as constraints to dynamically estimate the PDF of the sources with reasonable complexity. In addition, the new algorithm performs optimization with respect to each of the cost function gradient directions separately, enabling parallel implementations on multi-core computers. We demonstrate the superior performance of ICA-EMK over competing ICA algorithms using simulated as well as real-world data.
Multi-objective Reinforcement Learning through Continuous Pareto Manifold Approximation
Parisi, Simone, Pirotta, Matteo, Restelli, Marcello
Many real-world control applications, from economics to robotics, are characterized by the presence of multiple conflicting objectives. In these problems, the standard concept of optimality is replaced by Pareto-optimality and the goal is to find the Pareto frontier, a set of solutions representing different compromises among the objectives. Despite recent advances in multi-objective optimization, achieving an accurate representation of the Pareto frontier is still an important challenge. In this paper, we propose a reinforcement learning policy gradient approach to learn a continuous approximation of the Pareto frontier in multi-objective Markov Decision Problems (MOMDPs). Differently from previous policy gradient algorithms, where n optimization routines are executed to have n solutions, our approach performs a single gradient ascent run, generating at each step an improved continuous approximation of the Pareto frontier. The idea is to optimize the parameters of a function defining a manifold in the policy parameters space, so that the corresponding image in the objectives space gets as close as possible to the true Pareto frontier. Besides deriving how to compute and estimate such gradient, we will also discuss the non-trivial issue of defining a metric to assess the quality of the candidate Pareto frontiers. Finally, the properties of the proposed approach are empirically evaluated on two problems, a linear-quadratic Gaussian regulator and a water reservoir control task.
Stochastic Heavy Ball
Gadat, Sébastien, Panloup, Fabien, Saadane, Sofiane
This paper deals with a natural stochastic optimization procedure derived from the so-called Heavy-ball method differential equation, which was introduced by Polyak in the 1960s with his seminal contribution [Pol64]. The Heavy-ball method is a second-order dynamics that was investigated to minimize convex functions f . The family of second-order methods recently received a large amount of attention, until the famous contribution of Nesterov [Nes83], leading to the explosion of large-scale optimization problems. This work provides an in-depth description of the stochastic heavy-ball method, which is an adaptation of the deterministic one when only unbiased evalutions of the gradient are available and used throughout the iterations of the algorithm. We first describe some almost sure convergence results in the case of general non-convex coercive functions f . We then examine the situation of convex and strongly convex potentials and derive some non-asymptotic results about the stochastic heavy-ball method. We end our study with limit theorems on several rescaled algorithms.
UTA-poly and UTA-splines: additive value functions with polynomial marginals
Sobrie, Olivier, Gillis, Nicolas, Mousseau, Vincent, Pirlot, Marc
Additive utility function models are widely used in multiple criteria decision analysis. In such models, a numerical value is associated to each alternative involved in the decision problem. It is computed by aggregating the scores of the alternative on the different criteria of the decision problem. The score of an alternative is determined by a marginal value function that evolves monotonically as a function of the performance of the alternative on this criterion. Determining the shape of the marginals is not easy for a decision maker. It is easier for him/her to make statements such as "alternativea is preferred tob". In order to help the decision maker, UTA disaggregation procedures use linear programming to approximate the marginals by piecewise linear functions based only on such statements. In this paper, we propose to infer polynomials and splines instead of piecewise linear functions for the marginals. In this aim, we use semidefinite programming instead of linear programming. We illustrate this new elicitation method and present some experimental results. Introduction The theory of value functions aims at assigning a number to each alternative in such a way that the decision maker's preference order on the alternatives is the same as the order on the numbers associated with the alternatives. The number or value associated to an alternative is a monotone function of its evaluations on the various relevant criteria. For preferences satisfying some additional properties (includingpreferential independence), the value of an alternative can be obtained as the sum of marginal value functions each depending only on a single criterion [20, Chapter 6]. These functions usually are monotone, i.e., marginal value functions either increase or decrease with the assessment of the alternative on the associated criterion. Many questioning protocols have been proposed aiming to elicit an additive value function [20, 9] through interactions with the decision maker (DM). These direct elicitation methods are time-consuming and require a substantial cognitive effort from the DM. Therefore, in certain cases, an indirect approach may prove fruitful. The latter consists inlearning an additive value model (or a set of such models) from a set of declared or observed preferences. Learning approaches have been proposed not only for inferring an additive value function that is used to rank all other alternatives.