Optimization
Data Driven Resource Allocation for Distributed Learning
Dick, Travis, Li, Mu, Pillutla, Venkata Krishna, White, Colin, Balcan, Maria Florina, Smola, Alex
In distributed machine learning, data is dispatched to multiple machines for processing. Motivated by the fact that similar data points often belong to the same or similar classes, and more generally, classification rules of high accuracy tend to be "locally simple but globally complex" (Vapnik & Bottou 1993), we propose data dependent dispatching that takes advantage of such structure. We present an in-depth analysis of this model, providing new algorithms with provable worst-case guarantees, analysis proving existing scalable heuristics perform well in natural non worst-case conditions, and techniques for extending a dispatching rule from a small sample to the entire distribution. We overcome novel technical challenges to satisfy important conditions for accurate distributed learning, including fault tolerance and balancedness. We empirically compare our approach with baselines based on random partitioning, balanced partition trees, and locality sensitive hashing, showing that we achieve significantly higher accuracy on both synthetic and real world image and advertising datasets. We also demonstrate that our technique strongly scales with the available computing power.
Quantum Monte Carlo simulation of a particular class of non-stoquastic Hamiltonians in quantum annealing
Quantum annealing is a generic solver of the optimization problem that uses fictitious quantum fluctuation. Its simulation in classical computing is often performed using the quantum Monte Carlo simulation via the Suzuki--Trotter decomposition. However, the negative sign problem sometimes emerges in the simulation of quantum annealing with an elaborate driver Hamiltonian, since it belongs to a class of non-stoquastic Hamiltonians. In the present study, we propose an alternative way to avoid the negative sign problem involved in a particular class of the non-stoquastic Hamiltonians. To check the validity of the method, we demonstrate our method by applying it to a simple problem that includes the anti-ferromagnetic XX interaction, which is a typical instance of the non-stoquastic Hamiltonians.
Towards Adaptive Training of Agent-based Sparring Partners for Fighter Pilots
Israelsen, Brett W., Ahmed, Nisar, Center, Kenneth, Green, Roderick, Bennett, Winston Jr
A key requirement for the current generation of artificial decision-makers is that they should adapt well to changes in unexpected situations. This paper addresses the situation in which an AI for aerial dog fighting, with tunable parameters that govern its behavior, must optimize behavior with respect to an objective function that is evaluated and learned through simulations. Bayesian optimization with a Gaussian Process surrogate is used as the method for investigating the objective function. One key benefit is that during optimization, the Gaussian Process learns a global estimate of the true objective function, with predicted outcomes and a statistical measure of confidence in areas that haven't been investigated yet. Having a model of the objective function is important for being able to understand possible outcomes in the decision space; for example this is crucial for training and providing feedback to human pilots. However, standard Bayesian optimization does not perform consistently or provide an accurate Gaussian Process surrogate function for highly volatile objective functions. We treat these problems by introducing a novel sampling technique called Hybrid Repeat/Multi-point Sampling. This technique gives the AI ability to learn optimum behaviors in a highly uncertain environment. More importantly, it not only improves the reliability of the optimization, but also creates a better model of the entire objective surface. With this improved model the agent is equipped to more accurately/efficiently predict performance in unexplored scenarios.
Constrained Maximum Correntropy Adaptive Filtering
Peng, Siyuan, Chen, Badong, Sun, Lei, Lin, Zhiping, Ser, Wee
Constrained adaptive filtering algorithms inculding constrained least mean square (CLMS), constrained affine projection (CAP) and constrained recursive least squares (CRLS) have been extensively studied in many applications. Most existing constrained adaptive filtering algorithms are developed under mean square error (MSE) criterion, which is an ideal optimality criterion under Gaussian noises. This assumption however fails to model the behavior of non-Gaussian noises found in practice. Motivated by the robustness and simplicity of maximum correntropy criterion (MCC) in non-Gaussian impulsive noises, this paper proposes a new adaptive filtering algorithm called constrained maximum correntropy criterion (CMCC). Specifically, CMCC incorporates a linear constraint into a MCC filter to solve a constrained optimization problem explicitly. The proposed adaptive filtering algorithm is easy to implement and has low computational complexity, and in terms of convergence accuracy (say lower mean square deviation) and stability, can significantly outperform those MSE based constrained adaptive algorithms in presence of heavy-tailed impulsive noises. Additionally, the mean square convergence behaviors are studied under energy conservation relation, and a sufficient condition to ensure the mean square convergence and the steady-state mean square deviation (MSD) of the proposed algorithm are obtained. Simulation results confirm the theoretical predictions under both Gaussian and non- Gaussian noises, and demonstrate the excellent performance of the novel algorithm by comparing it with other conventional methods.
Low-Rank Inducing Norms with Optimality Interpretations
Grussler, Christian, Giselsson, Pontus
Optimization problems with rank constraints appear in many diverse fields such as control, machine learning and image analysis. Since the rank constraint is non-convex, these problems are often approximately solved via convex relaxations. Nuclear norm regularization is the prevailing convexifying technique for dealing with these types of problem. This paper introduces a family of low-rank inducing norms and regularizers which includes the nuclear norm as a special case. A posteriori guarantees on solving an underlying rank constrained optimization problem with these convex relaxations are provided. We evaluate the performance of the low-rank inducing norms on three matrix completion problems. In all examples, the nuclear norm heuristic is outperformed by convex relaxations based on other low-rank inducing norms. For two of the problems there exist low-rank inducing norms that succeed in recovering the partially unknown matrix, while the nuclear norm fails. These low-rank inducing norms are shown to be representable as semi-definite programs and to have cheaply computable proximal mappings. The latter makes it possible to also solve problems of large size with the help of scalable first-order methods. Finally, it is proven that our findings extend to the more general class of atomic norms. In particular, this allows us to solve corresponding vector-valued problems, as well as problems with other non-convex constraints.
Advancing Bayesian Optimization: The Mixed-Global-Local (MGL) Kernel and Length-Scale Cool Down
Wabersich, Kim Peter, Toussaint, Marc
Bayesian Optimization (BO) has become a core method for solving expensive black-box optimization problems. While much research focussed on the choice of the acquisition function, we focus on online length-scale adaption and the choice of kernel function. Instead of choosing hyperparameters in view of maximum likelihood on past data, we propose to use the acquisition function to decide on hyperparameter adaptation more robustly and in view of the future optimization progress. Further, we propose a particular kernel function that includes non-stationarity and local anisotropy and thereby implicitly integrates the efficiency of local convex optimization with global Bayesian optimization. Comparisons to state-of-the art BO methods underline the efficiency of these mechanisms on global optimization benchmarks.
Safety Verification and Control for Collision Avoidance at Road Intersections
Ahn, Heejin, Del Vecchio, Domitilla
This paper presents the design of a supervisory algorithm that monitors safety at road intersections and overrides drivers with a safe input when necessary. The design of the supervisor consists of two parts: safety verification and control design. Safety verification is the problem to determine if vehicles will be able to cross the intersection without colliding with current drivers' inputs. We translate this safety verification problem into a jobshop scheduling problem, which minimizes the maximum lateness and evaluates if the optimal cost is zero. The zero optimal cost corresponds to the case in which all vehicles can cross each conflict area without collisions. Computing the optimal cost requires solving a Mixed Integer Nonlinear Programming (MINLP) problem due to the nonlinear second-order dynamics of the vehicles. We therefore estimate this optimal cost by formulating two related Mixed Integer Linear Programming (MILP) problems that assume simpler vehicle dynamics. We prove that these two MILP problems yield lower and upper bounds of the optimal cost. We also quantify the worst case approximation errors of these MILP problems. We design the supervisor to override the vehicles with a safe control input if the MILP problem that computes the upper bound yields a positive optimal cost. We theoretically demonstrate that the supervisor keeps the intersection safe and is non-blocking. Computer simulations further validate that the algorithms can run in real time for problems of realistic size.
Distributed Optimization of Multi-Class SVMs
Alber, Maximilian, Zimmert, Julian, Dogan, Urun, Kloft, Marius
Training of one-vs.-rest SVMs can be parallelized over the number of classes in a straight forward way. Given enough computational resources, one-vs.-rest SVMs can thus be trained on data involving a large number of classes. The same cannot be stated, however, for the so-called all-in-one SVMs, which require solving a quadratic program of size quadratically in the number of classes. We develop distributed algorithms for two all-in-one SVM formulations (Lee et al. and Weston and Watkins) that parallelize the computation evenly over the number of classes. This allows us to compare these models to one-vs.-rest SVMs on unprecedented scale. The results indicate superior accuracy on text classification data.
Fair task allocation in transportation
Ye, Qing Chuan, Zhang, Yingqian, Dekker, Rommert
Traditionally, optimization of task allocation problems considered only the costs involved in the allocation. However, there has been in recent years more attention to cases where cost should not always be the sole consideration (Campbell et al. 2008). There are circumstances when other criteria need to be taken into account as well during the decision making process. Fairness has been considered as one of the important additional criteria in many application domains (Ogryczak et al. 2005, Gopinathan and Li 2011, Bertsimas et al. 2012). Although there is no common definition for the term, there are two fairness criteria that are often used in the literature: the Nash bargaining criterion and the Rawlsian maximin criterion. The former is based on Nash's four axioms of pareto optimality, independency of irrelevant alternatives, symmetry, and invariance to affine transformations or equivalent utility representations (Nash 1950). The latter is based on Rawls' two principles of justice (Rawls 1971). Rawls' maximin criterion maximizes the welfare level of the worst-off group member and has therefore been used in allocation problems (Jaffe 1981, Kumar and Kleinberg 2000).