Optimization
Meta-Learning for Black-box Optimization
TV, Vishnu, Malhotra, Pankaj, Narwariya, Jyoti, Vig, Lovekesh, Shroff, Gautam
Recently, neural networks trained as optimizers under the "learning to learn" or meta-learning framework have been shown to be effective for a broad range of optimization tasks including derivative-free black-box function optimization. Recurrent neural networks (RNNs) trained to optimize a diverse set of synthetic non-convex differentiable functions via gradient descent have been effective at optimizing derivative-free black-box functions. In this work, we propose RNN-Opt: an approach for learning RNN-based optimizers for optimizing real-parameter single-objective continuous functions under limited budget constraints. Existing approaches utilize an observed improvement based meta-learning loss function for training such models. We propose training RNN-Opt by using synthetic non-convex functions with known (approximate) optimal values by directly using discounted regret as our meta-learning loss function. We hypothesize that a regret-based loss function mimics typical testing scenarios, and would therefore lead to better optimizers compared to optimizers trained only to propose queries that improve over previous queries. Further, RNN-Opt incorporates simple yet effective enhancements during training and inference procedures to deal with the following practical challenges: i) Unknown range of possible values for the black-box function to be optimized, and ii) Practical and domain-knowledge based constraints on the input parameters. We demonstrate the efficacy of RNN-Opt in comparison to existing methods on several synthetic as well as standard benchmark black-box functions along with an anonymized industrial constrained optimization problem.
Highly parallel algorithm for the Ising ground state searching problem
Yavorsky, A., Markovich, L. A., Polyakov, E. A., Rubtsov, A. N.
Finding an energy minimum in the Ising model is an exemplar objective, associated with many combinatorial optimization problems, that is computationally hard in general, but occurs in all areas of modern science. There are several numerical methods, providing solution for the medium size Ising spin systems. However, they are either computationally slow and badly parallelized, or do not give sufficiently good results for the large systems. In this paper, we present a highly parallel algorithm, called Mean-field Annealing from a Random State (MARS), incorporating the best features of the classical simulated annealing (SA) and Mean-Field Annealing (MFA) methods. The algorithm is based on the mean-field descent from a randomly selected configuration and temperature. Since a single run requires little computational effort, the effectiveness can be achieved by massive parallelisation. MARS shows excellent performance both on the large Ising spin systems and on the set of exemplary maximum cut benchmark instances in terms of both solution quality and computational time.
On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost
Yang, Zhuoran, Chen, Yongxin, Hong, Mingyi, Wang, Zhaoran
Compared with the classical policy gradient algorithm 1992), actor-critic tracks the action-value function (critic) in policy gradient in an online(Williams, manner, and alternatively updates the policy (actor) and the critic. On the one hand, the online update of critic significantly reduces the variance of policy gradient and hence leads to faster convergence. On the other hand, it also introduces algorithmic instability, which is often observed in practice (Islam et al., 2017) and parallels the notoriously unstable training of generative adversarial and Vinyals, 2016). Such instability of actor-critic originates from severalnetworks (Pfau intertwining challenges, including(i) function approximation of actor and critic, (ii) improper choice of stepsizes, (iii) the noise arising from stochastic approximation, (iv) the asynchrony between actor and critic, and (v) possibly off-policy data used in the update of critic. As a result, the convergence of actor-critic remains much less well understood than that of policy gradient, which itself is open. Consequently, the practical use of actor-critic often lacks theoretical guidance. In this paper, we aim to theoretically understand the algorithmic instability of actor-critic.
Minimal Sample Subspace Learning: Theory and Algorithms
Subspace segmentation or subspace learning is a challenging and complicated task in machine learning. This paper builds a primary frame and solid theoretical bases for the minimal subspace segmentation (MSS) of finite samples. Existence and conditional uniqueness of MSS are discussed with conditions generally satisfied in applications. Utilizing weak prior information of MSS, the minimality inspection of segments is further simplified to the prior detection of partitions. The MSS problem is then modeled as a computable optimization problem via self-expressiveness of samples. A closed form of representation matrices is first given for the self-expressiveness, and the connection of diagonal blocks is then addressed. The MSS model uses a rank restriction on the sum of segment ranks. Theoretically, it can retrieve the minimal sample subspaces that could be heavily intersected. The optimization problem is solved via a basic manifold conjugate gradient algorithm, alternative optimization and hybrid optimization, taking into account of solving both the primal MSS problem and its pseudo-dual problem. The MSS model is further modified for handling noisy data, and solved by an ADMM algorithm. The reported experiments show the strong ability of the MSS method on retrieving minimal sample subspaces that are heavily intersected.
A Study and Analysis of a Feature Subset Selection Technique using Penguin Search Optimization Algorithm (FS-PeSOA)
Dasgupta, Agnip, Banerjee, Ardhendu, Dastidar, Aniket Ghosh, Barman, Antara, Chakraborty, Sanjay
In today world of enormous amounts of data, it is very important to extract useful knowledge from it. This can be accomplished by feature subset selection. Feature subset selection is a method of selecting a minimum number of features with the help of which our machine can learn and predict which class a particular data belongs to. We will introduce a new adaptive algorithm called Feature selection Penguin Search optimization algorithm which is a metaheuristic approach. It is adapted from the natural hunting strategy of penguins in which a group of penguins take jumps at random depths and come back and share the status of food availability with other penguins and in this way, the global optimum solution is found. In order to explore the feature subset candidates, the bioinspired approach Penguin Search optimization algorithm generates during the process a trial feature subset and estimates its fitness value by using three different classifiers for each case: Random Forest, Nearest Neighbour and Support Vector Machines. However, we are planning to implement our proposed approach Feature selection Penguin Search optimization algorithm on some well known benchmark datasets collected from the UCI repository and also try to evaluate and compare its classification accuracy with some state of art algorithms.
Learning to Handle Parameter Perturbations in Combinatorial Optimization: an Application to Facility Location
Lodi, Andrea, Mossina, Luca, Rachelson, Emmanuel
We present an approach to couple the resolution of Combinatorial Optimization problems with methods from Machine Learning, applied to the single source, capacitated, facility location problem. Our study is framed in the context where a reference facility location optimization problem is given. Assuming there exist data for many variations of the reference problem (historical or simulated) along with their optimal solution, we study how one can exploit these to make predictions about an unseen new instance. We demonstrate how a classifier can be built from these data to determine whether the solution to the reference problem still applies to a new instance. In case the reference solution is partially applicable, we build a regressor indicating the magnitude of the expected change, and conversely how much of it can be kept for the new instance. This insight, derived from a priori information, is expressed via an additional constraint in the original mathematical programming formulation. We present an empirical evaluation and discuss the benefits, drawbacks and perspectives of such an approach. Although presented through the application to the facility location problem, the approach developed here is general and explores a new perspective on the exploitation of past experience in combinatorial optimization.
Dual Extrapolation for Sparse Generalized Linear Models
Massias, Mathurin, Vaiter, Samuel, Gramfort, Alexandre, Salmon, Joseph
Generalized Linear Models (GLM) form a wide class of regression and classification models, where prediction is a function of a linear combination of the input variables. For statistical inference in high dimension, sparsity inducing regularizations have proven to be useful while offering statistical guarantees. However, solving the resulting optimization problems can be challenging: even for popular iterative algorithms such as coordinate descent, one needs to loop over a large number of variables. To mitigate this, techniques known as screening rules and working sets diminish the size of the optimization problem at hand, either by progressively removing variables, or by solving a growing sequence of smaller problems. For both techniques, significant variables are identified thanks to convex duality arguments. In this paper, we show that the dual iterates of a GLM exhibit a Vector AutoRegressive (VAR) behavior after sign identification, when the primal problem is solved with proximal gradient descent or cyclic coordinate descent. Exploiting this regularity, one can construct dual points that offer tighter certificates of optimality, enhancing the performance of screening rules and helping to design competitive working set algorithms.
Fairness without Regret
A popular approach of achieving fairness in optimization problems is by constraining the solution space to "fair" solutions, which unfortunately typically reduces solution quality. In practice, the ultimate goal is often an aggregate of sub-goals without a unique or best way of combining them or which is otherwise only partially known. I turn this problem into a feature and suggest to use a parametrized objective and vary the parameters within reasonable ranges to get a "set" of optimal solutions, which can then be optimized using secondary criteria such as fairness without compromising the primary objective, i.e. without regret (societal cost).
Learning to Generate Industrial SAT Instances
Wu, Haoze (Stanford University) | Ramanujan, Raghuram (Davidson College)
In this paper, we present Satgen, the first implicit generative model of real-world Boolean Satisfiability (SAT) formulas. Our approach uses unsupervised machine learning techniques to generate new formulas by mimicking the structural properties of a given input formula Phi. We proceed in two phases: first, we construct the Literal-Incidence Graph (LIG) of Phi. This is used by a Generative Adversarial Network (GAN) to generate new LIGs that exhibit graph-theoretic properties similar to those of the LIG of Phi. In the second phase, we extract a formula whose LIG would correspond to the generated graph. We show that generating such a formula is equivalent to finding a minimal clique edge cover of the given graph, which we tackle efficiently using a greedy hill-climbing algorithm. We verify experimentally that our approach generates formulas that closely resemble a given real-world SAT instance, as measured by a range of different metrics.
A General Interactive Approach for Solving Multi-Objective Combinatorial Optimization Problems with Imprecise Preferences
Benabbou, Nawal (LIP6) | Lust, Thibaut (Sorbonne University)
In this paper, we develop a general interactive method to solve multi-objective combinatorial optimization problems with imprecise preferences. Assuming that preferences can be represented by a parameterized scalarizing function, we iteratively ask preferences queries to the decision maker in order to reduce the uncertainty over the preference parameters until being able to determine her preferred solution. To produce informative preference queries at each step, we generate promising solutions using the extreme points of the polyhedron representing the admissible preference parameters and then we ask the decision maker to compare two of these solutions (we propose different selection strategies). These extreme points are also used to provide a stopping criterion guaranteeing that the returned solution is optimal (or near-optimal) according to the decision maker's preferences. For the multi-objective spanning tree problem with a linear aggregation function, we provide numerical results to demonstrate the practical efficiency of our approach and we compare our results to a recent approach based on minimax regret, where preferences are asked during the construction of a solution. We show that better results are achieved by our method both in terms of running time and number of questions.