Optimization
Resilient Non-Submodular Maximization over Matroid Constraints
Tzoumas, Vasileios, Jadbabaie, Ali, Pappas, George J.
Applications in control, robotics, and optimization motivate the design of systems by selecting system elements, such as actuators, sensors, or data, subject to complex design constraints that require the system elements not only to be a few in number, but also, to satisfy heterogeneity or global-interdependency constraints; in particular, matroid constraints. However, in failure-prone and adversarial environments, sensors get attacked; actuators fail; data get deleted. Thence, traditional matroid-constrained design paradigms become insufficient and, in contrast, resilient matroid-constrained designs against attacks, failures, or deletions become important. In general, resilient matroid-constrained design problems are computationally hard. Also, even though they often involve objective functions that are monotone and (possibly) submodular, no scalable approximation algorithms are known for their solution. In this paper, we provide the first algorithm, that achieves the following characteristics: system-wide resiliency, i.e., the algorithm is valid for any number of denial-of-service attacks, deletions, or failures; minimal running time, i.e., the algorithm terminates with the same running time as state-of-the-art algorithms for (non-resilient) matroid-constrained optimization; and provable approximation performance, i.e., the algorithm guarantees for monotone objective functions a solution close to the optimal. We quantify the algorithm's approximation performance using a notion of curvature for monotone (not necessarily submodular) set functions. Finally, we support our theoretical analyses with numerical experiments, by considering a control-aware sensor selection scenario, namely, sensing-constrained robot navigation.
Estimation of Markov Chain via Rank-constrained Likelihood
Li, Xudong, Wang, Mengdi, Zhang, Anru
This paper studies the recovery and state compression of low-rank Markov chains from empirical trajectories. We propose a non-convex estimator based on rank-constrained likelihood maximization. Statistical upper bounds are provided for the Kullback-Leiber divergence and the $\ell_2$ risk between the estimator and the true transition matrix. The estimator reveals a compressed state space of the Markov chain. We also develop a novel DC (difference of convex function) programming algorithm to tackle the rank-constrained non-smooth optimization problem. Convergence results are established. Experiments with taxi trip data show that the estimator is able to identify the zoning of Manhattan city.
Recursive Optimization of Convex Risk Measures: Mean-Semideviation Models
Kalogerias, Dionysios S., Powell, Warren B.
We develop and analyze stochastic subgradient methods for optimizing a new, versatile, application-friendly and tractable class of convex risk measures, termed here as mean-semideviations. Their construction relies on on the concept of a risk regularizer, a one-dimensional nonlinear map with certain properties, essentially generalizing the positive part weighting function in the mean-upper-semideviation risk measure. After we formally introduce mean-semideviations, we study their basic properties, and we present a fundamental constructive characterization result, demonstrating their generality. We then introduce and rigorously analyze the MESSAGEp algorithm, an efficient stochastic subgradient procedure for iteratively solving convex mean-semideviation risk-averse problems to optimality. The MESSAGEp algorithm may be derived as an application of the T-SCGD algorithm of (Yang et al., 2018). However, the generic theoretical framework of (Yang et al., 2018) is too narrow and structurally restrictive, as far as optimization of mean-semideviations is concerned, including the classical mean-upper-semideviation risk measure. By exploiting problem structure, we propose a substantially weaker theoretical framework, under which we establish pathwise convergence of the MESSAGEp algorithm, under the same strong sense as in (Yang et al., 2018). The new framework reveals a fundamental trade-off between the smoothness of the random position function and that of the particular mean-semideviation risk measure under consideration. Further, we explicitly show that the class of mean-semideviation problems supported under our framework is strictly larger than the respective class of problems supported in (Yang et al., 2018). Thus, applicability of compositional stochastic optimization is established for a strictly wider spectrum of mean-semideviation problems, justifying the purpose of our work.
Aggregated Momentum: Stability Through Passive Damping
Lucas, James, Zemel, Richard, Grosse, Roger
Momentum is a simple and widely used trick which allows gradient-based optimizers to pick up speed in low curvature directions. Its performance depends crucially on a damping coefficient $\beta$. Large $\beta$ values can potentially deliver much larger speedups, but are prone to oscillations and instability; hence one typically resorts to small values such as 0.5 or 0.9. We propose Aggregated Momentum (AggMo), a variant of momentum which combines multiple velocity vectors with different $\beta$ parameters. AggMo is trivial to implement, but significantly dampens oscillations, enabling it to remain stable even for aggressive $\beta$ values such as 0.999. We reinterpret Nesterov's accelerated gradient descent as a special case of AggMo and provide theoretical convergence bounds for online convex optimization. Empirically, we find that AggMo is a suitable drop-in replacement for other momentum methods, and frequently delivers faster convergence.
Fundamental Resource Trade-offs for Encoded Distributed Optimization
Avestimehr, A. Salman, Kalan, Seyed Mohammadreza Mousavi, Soltanolkotabi, Mahdi
Dealing with the shear size and complexity of today's massive data sets requires computational platforms that can analyze data in a parallelized and distributed fashion. A major bottleneck that arises in such modern distributed computing environments is that some of the worker nodes may run slow. These nodes a.k.a.~stragglers can significantly slow down computation as the slowest node may dictate the overall computational time. A recent computational framework, called encoded optimization, creates redundancy in the data to mitigate the effect of stragglers. In this paper we develop novel mathematical understanding for this framework demonstrating its effectiveness in much broader settings than was previously understood. We also analyze the convergence behavior of iterative encoded optimization algorithms, allowing us to characterize fundamental trade-offs between convergence rate, size of data set, accuracy, computational load (or data redundancy), and straggler toleration in this framework.
Locally Convex Sparse Learning over Networks
Zaki, Ahmed, Chatterjee, Saikat, Mitra, Partha P., Rasmussen, Lars K.
We consider a distributed learning setup where a sparse signal is estimated over a network. Our main interest is to save communication resource for information exchange over the network and reduce processing time. Each node of the network uses a convex optimization based algorithm that provides a locally optimum solution for that node. The nodes exchange their signal estimates over the network in order to refine their local estimates. At a node, the optimization algorithm is based on an $\ell_1$-norm minimization with appropriate modifications to promote sparsity as well as to include influence of estimates from neighboring nodes. Our expectation is that local estimates in each node improve fast and converge, resulting in a limited demand for communication of estimates between nodes and reducing the processing time. We provide restricted-isometry-property (RIP)-based theoretical analysis on estimation quality. In the scenario of clean observation, it is shown that the local estimates converge to the exact sparse signal under certain technical conditions. Simulation results show that the proposed algorithms show competitive performance compared to a globally optimum distributed LASSO algorithm in the sense of convergence speed and estimation error.
Joint Optimization Framework for Learning with Noisy Labels
Tanaka, Daiki, Ikami, Daiki, Yamasaki, Toshihiko, Aizawa, Kiyoharu
Deep neural networks (DNNs) trained on large-scale datasets have exhibited significant performance in image classification. Many large-scale datasets are collected from websites, however they tend to contain inaccurate labels that are termed as noisy labels. Training on such noisy labeled datasets causes performance degradation because DNNs easily overfit to noisy labels. To overcome this problem, we propose a joint optimization framework of learning DNN parameters and estimating true labels. Our framework can correct labels during training by alternating update of network parameters and labels. We conduct experiments on the noisy CIFAR-10 datasets and the Clothing1M dataset. The results indicate that our approach significantly outperforms other state-of-the-art methods.
Distributed Constraint Optimization Problems and Applications: A Survey
Fioretto, Ferdinando, Pontelli, Enrico, Yeoh, William
The field of multi-agent system (MAS) is an active area of research within artificial intelligence, with an increasingly important impact in industrial and other real-world applications. In a MAS, autonomous agents interact to pursue personal interests and/or to achieve common objectives. Distributed Constraint Optimization Problems (DCOPs) have emerged as a prominent agent model to govern the agents' autonomous behavior, where both algorithms and communication models are driven by the structure of the specific problem. During the last decade, several extensions to the DCOP model have been proposed to enable support of MAS in complex, real-time, and uncertain environments. This survey provides an overview of the DCOP model, offering a classification of its multiple extensions and addressing both resolution methods and applications that find a natural mapping within each class of DCOPs. The proposed classification suggests several future perspectives for DCOP extensions and identifies challenges in the design of efficient resolution algorithms, possibly through the adaptation of strategies from different areas.
A Stochastic Large-scale Machine Learning Algorithm for Distributed Features and Observations
As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine learning and predictive modeling, distributed optimization methods have recently garnered ample attention, in particular when either observations or features are distributed, but not both. We propose a general stochastic algorithm where observations, features, and gradient components can be sampled in a double distributed setting, i.e., with both features and observations distributed. Very technical analyses establish convergence properties of the algorithm under different conditions on the learning rate (diminishing to zero or constant). Computational experiments in Spark demonstrate a superior performance of our algorithm versus a benchmark in early iterations of the algorithm, which is due to the stochastic components of the algorithm.
Efficient First-Order Algorithms for Adaptive Signal Denoising
Ostrovskii, Dmitrii, Harchaoui, Zaid
We consider the problem of discrete-time signal denoising, focusing on a specific family of non-linear convolution-type estimators. Each such estimator is associated with a time-invariant filter which is obtained adaptively, by solving a certain convex optimization problem. Adaptive convolution-type estimators were demonstrated to have favorable statistical properties. However, the question of their computational complexity remains largely unexplored, and in fact we are not aware of any publicly available implementation of these estimators. Our first contribution is an efficient implementation of these estimators via some known first-order proximal algorithms. Our second contribution is a computational complexity analysis of the proposed procedures, which takes into account their statistical nature and the related notion of statistical accuracy. The proposed procedures and their analysis are illustrated on a simulated data benchmark.