Optimization
Stabilizing Value Iteration with and without Approximation Errors
Intelligent control using adaptive/approximate dynamic programming (ADP), sometimes referred to by reinforcement learning (RL) or neuro-dynamic programming (NDP), is a set of powerful tools for obtaining approximate solutions to difficult and mathematically intractable problems which seek optimum while sometimes even no knowledge of the system model/dynamics is available. The dramatic potential of the tools in practice has attracted many researchers within the last few decades, [1]- [13]. The multitude of appeared papers and success stories on applications of ADP to different problems, however, has intensified the need for firm mathematical analyses for guaranteeing the convergence of the learning processes and the stability of the results. Besides the classifications of heuristic dynamic programming (HDP), dual heuristic programming (DHP), etc. [7], which are in terms of the variables subject to approximation and their dependencies, the learning algorithms are typically based on either value iteration (VI) or policy iteration (PI), [3], [14]. These algorithms are well investigated both by computer scientists for machine learning [3] and by control scientists for feedback control of dynamical systems [14].
Theoretical and Numerical Analysis of Approximate Dynamic Programming with Approximation Errors
This study is aimed at answering the famous question of how the approximation errors at each iteration of Approximate Dynamic Programming (ADP) affect the quality of the final results considering the fact that errors at each iteration affect the next iteration. To this goal, convergence of Value Iteration scheme of ADP for deterministic nonlinear optimal control problems with undiscounted cost functions is investigated while considering the errors existing in approximating respective functions. The boundedness of the results around the optimal solution is obtained based on quantities which are known in a general optimal control problem and assumptions which are verifiable. Moreover, since the presence of the approximation errors leads to the deviation of the results from optimality, sufficient conditions for stability of the system operated by the result obtained after a finite number of value iterations, along with an estimation of its region of attraction, are derived in terms of a calculable upper bound of the control approximation error. Finally, the process of implementation of the method on an orbital maneuver problem is investigated through which the assumptions made in the theoretical developments are verified and the sufficient conditions are applied for guaranteeing stability and near optimality.
Newton Sketch: A Linear-time Optimization Algorithm with Linear-Quadratic Convergence
Pilanci, Mert, Wainwright, Martin J.
We propose a randomized second-order method for optimization known as the Newton Sketch: it is based on performing an approximate Newton step using a randomly projected or sub-sampled Hessian. For self-concordant functions, we prove that the algorithm has super-linear convergence with exponentially high probability, with convergence and complexity guarantees that are independent of condition numbers and related problem-dependent quantities. Given a suitable initialization, similar guarantees also hold for strongly convex and smooth objectives without self-concordance. When implemented using randomized projections based on a sub-sampled Hadamard basis, the algorithm typically has substantially lower complexity than Newton's method. We also describe extensions of our methods to programs involving convex constraints that are equipped with self-concordant barriers. We discuss and illustrate applications to linear programs, quadratic programs with convex constraints, logistic regression and other generalized linear models, as well as semidefinite programs.
Exact and Heuristic Algorithms for Semi-Nonnegative Matrix Factorization
Gillis, Nicolas, Kumar, Abhishek
Given a matrix $M$ (not necessarily nonnegative) and a factorization rank $r$, semi-nonnegative matrix factorization (semi-NMF) looks for a matrix $U$ with $r$ columns and a nonnegative matrix $V$ with $r$ rows such that $UV$ is the best possible approximation of $M$ according to some metric. In this paper, we study the properties of semi-NMF from which we develop exact and heuristic algorithms. Our contribution is threefold. First, we prove that the error of a semi-NMF of rank $r$ has to be smaller than the best unconstrained approximation of rank $r-1$. This leads us to a new initialization procedure based on the singular value decomposition (SVD) with a guarantee on the quality of the approximation. Second, we propose an exact algorithm (that is, an algorithm that finds an optimal solution), also based on the SVD, for a certain class of matrices (including nonnegative irreducible matrices) from which we derive an initialization for matrices not belonging to that class. Numerical experiments illustrate that this second approach performs extremely well, and allows us to compute optimal semi-NMF decompositions in many situations. Finally, we analyze the computational complexity of semi-NMF proving its NP-hardness, already in the rank-one case (that is, for $r = 1$), and we show that semi-NMF is sometimes ill-posed (that is, an optimal solution does not exist).
Generalized Low Rank Models
Udell, Madeleine, Horn, Corinne, Zadeh, Reza, Boyd, Stephen
Principal components analysis (PCA) is a well-known technique for approximating a tabular data set by a low rank matrix. Here, we extend the idea of PCA to handle arbitrary data sets consisting of numerical, Boolean, categorical, ordinal, and other data types. This framework encompasses many well known techniques in data analysis, such as nonnegative matrix factorization, matrix completion, sparse and robust PCA, $k$-means, $k$-SVD, and maximum margin matrix factorization. The method handles heterogeneous data sets, and leads to coherent schemes for compressing, denoising, and imputing missing entries across all data types simultaneously. It also admits a number of interesting interpretations of the low rank factors, which allow clustering of examples or of features. We propose several parallel algorithms for fitting generalized low rank models, and describe implementations and numerical results.
An Explicit Sampling Dependent Spectral Error Bound for Column Subset Selection
Yang, Tianbao, Zhang, Lijun, Jin, Rong, Zhu, Shenghuo
In this paper, we consider the problem of column subset selection. We present a novel analysis of the spectral norm reconstruction for a simple randomized algorithm and establish a new bound that depends explicitly on the sampling probabilities. The sampling dependent error bound (i) allows us to better understand the tradeoff in the reconstruction error due to sampling probabilities, (ii) exhibits more insights than existing error bounds that exploit specific probability distributions, and (iii) implies better sampling distributions. In particular, we show that a sampling distribution with probabilities proportional to the square root of the statistical leverage scores is always better than uniform sampling and is better than leverage-based sampling when the statistical leverage scores are very nonuniform. And by solving a constrained optimization problem related to the error bound with an efficient bisection search we are able to achieve better performance than using either the leverage-based distribution or that proportional to the square root of the statistical leverage scores. Numerical simulations demonstrate the benefits of the new sampling distributions for low-rank matrix approximation and least square approximation compared to state-of-the art algorithms.
Penalized versus constrained generalized eigenvalue problems
Gaynanova, Irina, Booth, James, Wells, Martin T.
We investigate the difference between using an $\ell_1$ penalty versus an $\ell_1$ constraint in generalized eigenvalue problems, such as principal component analysis and discriminant analysis. Our main finding is that an $\ell_1$ penalty may fail to provide very sparse solutions; a severe disadvantage for variable selection that can be remedied by using an $\ell_1$ constraint. Our claims are supported both by empirical evidence and theoretical analysis. Finally, we illustrate the advantages of an $\ell_1$ constraint in the context of discriminant analysis and principal component analysis.
Learning the Structure and Parameters of Large-Population Graphical Games from Behavioral Data
We consider learning, from strictly behavioral data, the structure and parameters of linear influence games (LIGs), a class of parametric graphical games introduced by Irfan and Ortiz (2014). LIGs facilitate causal strategic inference (CSI): Making inferences from causal interventions on stable behavior in strategic settings. Applications include the identification of the most influential individuals in large (social) networks. Such tasks can also support policy-making analysis. Motivated by the computational work on LIGs, we cast the learning problem as maximum-likelihood estimation (MLE) of a generative model defined by pure-strategy Nash equilibria (PSNE). Our simple formulation uncovers the fundamental interplay between goodness-of-fit and model complexity: good models capture equilibrium behavior within the data while controlling the true number of equilibria, including those unobserved. We provide a generalization bound establishing the sample complexity for MLE in our framework. We propose several algorithms including convex loss minimization (CLM) and sigmoidal approximations. We prove that the number of exact PSNE in LIGs is small, with high probability; thus, CLM is sound. We illustrate our approach on synthetic data and real-world U.S. congressional voting records. We briefly discuss our learning framework's generality and potential applicability to general graphical games.
Structured Block Basis Factorization for Scalable Kernel Matrix Evaluation
Wang, Ruoxi, Li, Yingzhou, Mahoney, Michael W., Darve, Eric
Kernel matrices are popular in machine learning and scientific computing, but they are limited by their quadratic complexity in both construction and storage. It is well-known that as one varies the kernel parameter, e.g., the width parameter in radial basis function kernels, the kernel matrix changes from a smooth low-rank kernel to a diagonally-dominant and then fully-diagonal kernel. Low-rank approximation methods have been widely-studied, mostly in the first case, to reduce the memory storage and the cost of computing matrix-vector products. Here, we use ideas from scientific computing to propose an extension of these methods to situations where the matrix is not well-approximated by a low-rank matrix. In particular, we construct an efficient block low-rank approximation method---which we call the Block Basis Factorization---and we show that it has $\mathcal{O}(n)$ complexity in both time and memory. Our method works for a wide range of kernel parameters, extending the domain of applicability of low-rank approximation methods, and our empirical results demonstrate the stability (small standard deviation in error) and superiority over current state-of-art kernel approximation algorithms.
Monotonous (Semi-)Nonnegative Matrix Factorization
NMF suffers from the scale and ordering ambiguities. Often, the source signals can be monotonous in nature. For example, in source separation problem, the source signals can be monotonously increasing or decreasing while the mixing matrix can have nonnegative entries. NMF methods may not be effective for such cases as it suffers from the ordering ambiguity. This paper proposes an approach to incorporate notion of monotonicity in NMF, labeled as monotonous NMF. An algorithm based on alternating least-squares is proposed for recovering monotonous signals from a data matrix. Further, the assumption on mixing matrix is relaxed to extend monotonous NMF for data matrix with real numbers as entries. The approach is illustrated using synthetic noisy data. The results obtained by monotonous NMF are compared with standard NMF algorithms in the literature, and it is shown that monotonous NMF estimates source signals well in comparison to standard NMF algorithms when the underlying sources signals are monotonous.