Optimization
The Sound of APALM Clapping: Faster Nonsmooth Nonconvex Optimization with Stochastic Asynchronous PALM
Davis, Damek, Edmunds, Brent, Udell, Madeleine
We introduce the Stochastic Asynchronous Proximal Alternating Linearized Minimization (SAPALM) method, a block coordinate stochastic proximal-gradient method for solving nonconvex, nonsmooth optimization problems. SAPALM is the first asynchronous parallel optimization method that provably converges on a large class of nonconvex, nonsmooth problems. We prove that SAPALM matches the best known rates of convergence --- among synchronous or asynchronous methods --- on this problem class. We provide upper bounds on the number of workers for which we can expect to see a linear speedup, which match the best bounds known for less complex problems, and show that in practice SAPALM achieves this linear speedup. We demonstrate state-of-the-art performance on several matrix factorization problems.
Parametric Simplex Method for Sparse Learning
Pang, Haotian, Liu, Han, Vanderbei, Robert J., Zhao, Tuo
High dimensional sparse learning has imposed a great computational challenge to large scale data analysis. In this paper, we investiage a broad class of sparse learning approaches formulated as linear programs parametrized by a {\em regularization factor}, and solve them by the parametric simplex method (PSM). PSM offers significant advantages over other competing methods: (1) PSM naturally obtains the complete solution path for all values of the regularization parameter; (2) PSM provides a high precision dual certificate stopping criterion; (3) PSM yields sparse solutions through very few iterations, and the solution sparsity significantly reduces the computational cost per iteration. Particularly, we demonstrate the superiority of PSM over various sparse learning approaches, including Dantzig selector for sparse linear regression, sparse support vector machine for sparse linear classification, and sparse differential network estimation. We then provide sufficient conditions under which PSM always outputs sparse solutions such that its computational performance can be significantly boosted.
Reinforcement Learning Enhanced Quantum-inspired Algorithm for Combinatorial Optimization
Beloborodov, Dmitrii, Ulanov, A. E., Foerster, Jakob N., Whiteson, Shimon, Lvovsky, A. I.
Quantum hardware and quantum-inspired algorithms are becoming increasingly popular for combinatorial optimization. However, these algorithms may require careful hyperparameter tuning for each problem instance. We use a reinforcement learning agent in conjunction with a quantum-inspired algorithm to solve the Ising energy minimization problem, which is equivalent to the Maximum Cut problem. The agent controls the algorithm by tuning one of its parameters with the goal of improving recently seen solutions. We propose a new Rescaled Ranked Reward (R3) method that enables stable single-player version of self-play training that helps the agent to escape local optima. The training on any problem instance can be accelerated by applying transfer learning from an agent trained on randomly generated problems. Our approach allows sampling high-quality solutions to the Ising problem with high probability and outperforms both baseline heuristics and a black-box hyperparameter optimization approach.
Multiple Metric Learning for Structured Data
We address the problem of merging graph and feature-space information while learning a metric from structured data. Existing algorithms tackle the problem in an asymmetric way, by either extracting vectorized summaries of the graph structure or adding hard constraints to feature-space algorithms. Following a different path, we define a metric regression scheme where we train metric-constrained linear combinations of dissimilarity matrices. The idea is that the input matrices can be pre-computed dissimilarity measures obtained from any kind of available data (e.g. node attributes or edge structure). As the model inputs are distance measures, we do not need to assume the existence of any underlying feature space. Main challenge is that metric constraints (especially positive-definiteness and sub-additivity), are not automatically respected if, for example, the coefficients of the linear combination are allowed to be negative. Both positive and sub-additive constraints are linear inequalities, but the computational complexity of imposing them scales as O(D3), where D is the size of the input matrices (i.e. the size of the data set). This becomes quickly prohibitive, even when D is relatively small. We propose a new graph-based technique for optimizing under such constraints and show that, in some cases, our approach may reduce the original computational complexity of the optimization process by one order of magnitude. Contrarily to existing methods, our scheme applies to any (possibly non-convex) metric-constrained objective function.
Development of modeling and control strategies for an approximated Gaussian process
Cui, Shisheng, Chang, Chia-Jung
The Gaussian process (GP) model, which has been extensively applied as priors of functions, has demonstrated excellent performance. The specification of a large number of parameters affects the computational efficiency and the feasibility of implementation of a control strategy. We propose a linear model to approximate GPs; this model expands the GP model by a series of basis functions. Several examples and simulation studies are presented to demonstrate the advantages of the proposed method. A control strategy is provided with the proposed linear model. Keywords: Data mining, forecasting, stochastic processes, control strategies INTRODUCTION The Gaussian process (GP) is a powerful modeling tool that has many applications in research and practice. It provides a practical and probabilistic approach to learning in kernel machines. The GP is extensively applied as a prior of a true function.
Assortment Optimization with Repeated Exposures and Product-dependent Patience Cost
In this paper, we study the assortment optimization problem faced by many online retailers such as Amazon. We develop a \emph{cascade multinomial logit model}, based on the classic multinomial logit model, to capture the consumers' purchasing behavior across multiple stages. Different from existing studies, our model allows for repeated exposures of a product, i.e., the same product can be displayed multiple times across different stages. In addition, each consumer has a \emph{patience budget} that is sampled from a known distribution and each product is associated with a \emph{patience cost}, which captures the cognitive efforts spent on browsing that product. Given an assortment of products, a consumer sequentially browses them stage by stage. After browsing all products in one stage, if the utility of a product exceeds the utility of the outside option, the consumer proceeds to purchase the product and leave the platform. Otherwise, if the patience cost of all products browsed up to that point is no larger than her patience budget, she continues to view the next stage. We propose an approximation solution to this problem.
Exponential Step Sizes for Non-Convex Optimization
Li, Xiaoyu, Zhuang, Zhenxun, Orabona, Francesco
Stochastic Gradient Descent (SGD) is a popular tool in large scale optimization of machine learning objective functions. However, the performance is greatly variable, depending on the choice of the step sizes. In this paper, we introduce the exponential step sizes for stochastic optimization of smooth non-convex functions which satisfy the Polyak-\L{}ojasiewicz (PL) condition. We show that, without any information on the level of noise over the stochastic gradients, these step sizes guarantee a convergence rate for the last iterate that automatically interpolates between a linear rate (in the noisy-free case) and a $O(\frac{1}{T})$ rate (in the noisy case), up to poly-logarithmic factors. Moreover, if without the PL condition, the exponential step sizes still guarantee optimal convergence to a critical point, up to logarithmic factors. We also validate our theoretical results with empirical experiments on real-world datasets with deep learning architectures.
Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence
Raschka, Sebastian, Patterson, Joshua, Nolet, Corey
Smarter applications are making better use of the insights gleaned from data, having an impact on every industry and research discipline. At the core of this revolution lies the tools and the methods that are driving it, from processing the massive piles of data generated each day to learning from and taking useful action. Deep neural networks, along with advancements in classical ML and scalable general-purpose GPU computing, have become critical components of artificial intelligence, enabling many of these astounding breakthroughs and lowering the barrier to adoption. Python continues to be the most preferred language for scientific computing, data science, and machine learning, boosting both performance and productivity by enabling the use of low-level libraries and clean high-level APIs. This survey offers insight into the field of machine learning with Python, taking a tour through important topics to identify some of the core hardware and software paradigms that have enabled it. We cover widely-used libraries and concepts, collected together for holistic comparison, with the goal of educating the reader and driving the field of Python machine learning forward.
The {0,1}-knapsack problem with qualitative levels
Schäfer, Luca E., Dietz, Tobias, Barbati, Maria, Figueira, José Rui, Greco, Salvatore, Ruzika, Stefan
A variant of the classical knapsack problem is considered in which each item is associated with an integer weight and a qualitative level. We define a dominance relation over the feasible subsets of the given item set and show that this relation defines a preorder. We propose a dynamic programming algorithm to compute the entire set of non-dominated rank cardinality vectors and we state two greedy algorithms, which efficiently compute a single efficient solution.
Generalized Kernel-Based Dynamic Mode Decomposition
Heas, Patrick, Herzet, Cedric, Combes, Benoit
Reduced modeling in high-dimensional reproducing kernel Hilbert spaces offers the opportunity to approximate efficiently non-linear dynamics. In this work, we devise an algorithm based on low rank constraint optimization and kernel-based computation that generalizes a recent approach called "kernel-based dynamic mode decomposition". This new algorithm is characterized by a gain in approximation accuracy, as evidenced by numerical simulations, and in computational complexity.