Optimization
Robust Nonnegative Matrix Factorization via $L_1$ Norm Regularization
Shen, Bin, Si, Luo, Ji, Rongrong, Liu, Baodi
Nonnegative Matrix Factorization (NMF) is a widely used technique in many applications such as face recognition, motion segmentation, etc. It approximates the nonnegative data in an original high dimensional space with a linear representation in a low dimensional space by using the product of two nonnegative matrices. In many applications data are often partially corrupted with large additive noise. When the positions of noise are known, some existing variants of NMF can be applied by treating these corrupted entries as missing values. However, the positions are often unknown in many real world applications, which prevents the usage of traditional NMF or other existing variants of NMF. This paper proposes a Robust Nonnegative Matrix Factorization (RobustNMF) algorithm that explicitly models the partial corruption as large additive noise without requiring the information of positions of noise. In practice, large additive noise can be used to model outliers. In particular, the proposed method jointly approximates the clean data matrix with the product of two nonnegative matrices and estimates the positions and values of outliers/noise. An efficient iterative optimization algorithm with a solid theoretical justification has been proposed to learn the desired matrix factorization. Experimental results demonstrate the advantages of the proposed algorithm.
Matrix Completion from Noisy Entries
Keshavan, Raghunandan H., Montanari, Andrea, Oh, Sewoong
Given a matrix M of low-rank, we consider the problem of reconstructing it from noisy observations of a small, random subset of its entries. The problem arises in a variety of applications, from collaborative filtering (the `Netflix problem') to structure-from-motion and positioning. We study a low complexity algorithm introduced by Keshavan et al.(2009), based on a combination of spectral techniques and manifold optimization, that we call here OptSpace. We prove performance guarantees that are order-optimal in a number of circumstances.
Randomized Smoothing for Stochastic Optimization
Duchi, John C., Bartlett, Peter L., Wainwright, Martin J.
We analyze convergence rates of stochastic optimization procedures for non-smooth convex optimization problems. By combining randomized smoothing techniques with accelerated gradient methods, we obtain convergence rates of stochastic optimization procedures, both in expectation and with high probability, that have optimal dependence on the variance of the gradient estimates. To the best of our knowledge, these are the first variance-based rates for non-smooth optimization. We give several applications of our results to statistical estimation problems, and provide experimental results that demonstrate the effectiveness of the proposed algorithms. We also describe how a combination of our algorithm with recent work on decentralized optimization yields a distributed stochastic optimization algorithm that is order-optimal.
Fast projections onto mixed-norm balls with applications
Joint sparsity offers powerful structural cues for feature selection, especially for variables that are expected to demonstrate a "grouped" behavior. Such behavior is commonly modeled via group-lasso, multitask lasso, and related methods where feature selection is effected via mixed-norms. Several mixed-norm based sparse models have received substantial attention, and for some cases efficient algorithms are also available. Surprisingly, several constrained sparse models seem to be lacking scalable algorithms. We address this deficiency by presenting batch and online (stochastic-gradient) optimization methods, both of which rely on efficient projections onto mixed-norm balls. We illustrate our methods by applying them to the multitask lasso. We conclude by mentioning some open problems.
A Study of Phase Transitions in Security Games
Jain, Manish (University of Southern California) | Leyton-Brown, Kevin (University of British Columbia) | Tambe, Milind (University of Southern California)
Stackelberg security games form the backbone of systems like ARMOR, IRIS and PROTECT, which are in regular use by the Los Angeles International Police, US Federal Air Marshal Service and the US Coast Guard respectively. An understanding of the runtime required by algorithms that power such systems is critical to furthering the application of game theory to other real-world domains. This paper identifies the concept of the deployment-to-saturation ratio in random Stackelberg security games, and shows that in a decision problem related to these games, the probability that a solution exists exhibits a phase transition as the ratio crosses 0.5. We demonstrate that this phase transition is invariant to changes both in the domain and the domain representation. Moreover, problem instances at this phase transition point are computationally harder than instances with other deployment-to-saturation ratios for a wide range of different equilibrium computation methods, including (i) previously published different MIP algorithms, and (ii) different underlying solvers and solution mechanisms. Our findings have at least two important implications. First, it is important for new algorithms to be evaluated on the hardest problem instances. We show that this has often not been done in the past, and introduce a publicly available benchmark suite to facilitate such comparisons. Second, we provide evidence that this phase transition region is also one where optimization would be of most benefit to security agencies, and thus requires significant attention from researchers in this area.
Adversarial Patrolling Games
Vorobeychik, Yevgeniy (University of Pennsylvania) | An, Bo (University of Southern California) | Tambe, Milind (University of Southern California)
Defender-Attacker Stackelberg games are the foundations of toolsdeployed for computing optimal patrolling strategies in adversarialdomains such as the United states Federal Air Marshals Service and the UnitedStates Coast Guard, among others.In Stackelberg game models of these systems the attacker knows only theprobability that each target is covered by the defender, but isoblivious to the detailed timing of the coverage schedule.In many real-world situations, however, the attacker can observe thecurrent location of the defender and can exploit this knowledge toreason about the defender's future moves.We study Stackelberg security games in which the defender sequentiallymoves between targets, with moves constrained by an exogenouslyspecified graph, while the attacker can observe the defender's currentlocation and his (stochastic) policy concerning future moves. We offerfive contributions: (1) We model this adversarial patrolling game (APG) as a stochastic game with special structure and presentseveral alternative formulations that leverage the general non-linearprogramming (NLP) approach for computing equilibria in zero-sumstochastic games. We show that our formulations yield significantlybetter solutions than previous approaches. (2) We extend theNLP formulation for APG allow for attacks that may take multiple timesteps to unfold.(3) We provide anapproximate MILP formulation that uses discrete defender moveprobabilities. (4) We experimentally demonstrate the efficacy of anNLP-based approach, and systematically study the impact of networktopology on the results.(5) We extend our model to allow the defender to construct the graph constraining his moves, at some cost, and offer novel algorithms for this setting, finding that a MILP approximation is much more effective than the exact NLP in this setting.
A Regularization Approach for Prediction of Edges and Node Features in Dynamic Graphs
Richard, Emile, Argyriou, Andreas, Evgeniou, Theodoros, Vayatis, Nicolas
Forecasting the behavior of systems with multiple responses has been a challenging problem in the context of many applications such as collaborative filtering, financial markets, or bioinformatics, where responses may be, respectively, movie ratings, stock prices, or activity of genes within a cell. Statistical modeling techniques have been widely applied for learning multivariate time series either in the multiple linear regression setting [3] or with autoregressive models [19]. More recently, kernel-based regularized methods have been developed for multitask learning [7, 2]. These approaches share in common the use of the correlation structure between input variables to enhance prediction of every single output. Frequently, the correlation structure is assumed to be given or is estimated separately.
A Convex Formulation for Learning Task Relationships in Multi-Task Learning
Multi-task learning is a learning paradigm which seeks to improve the generalization performance of a learning task with the help of some other related tasks. In this paper, we propose a regularization formulation for learning the relationships between tasks in multi-task learning. This formulation can be viewed as a novel generalization of the regularization framework for single-task learning. Besides modeling positive task correlation, our method, called multi-task relationship learning (MTRL), can also describe negative task correlation and identify outlier tasks based on the same underlying principle. Under this regularization framework, the objective function of MTRL is convex. For efficiency, we use an alternating method to learn the optimal model parameters for each task as well as the relationships between tasks. We study MTRL in the symmetric multi-task learning setting and then generalize it to the asymmetric setting as well. We also study the relationships between MTRL and some existing multi-task learning methods. Experiments conducted on a toy problem as well as several benchmark data sets demonstrate the effectiveness of MTRL.
Negative Tree Reweighted Belief Propagation
Liu, Qiang, Ihler, Alexander T.
We introduce a new class of lower bounds on the log partition function of a Markov random field which makes use of a reversed Jensen's inequality. In particular, our method approximates the intractable distribution using a linear combination of spanning trees with negative weights. This technique is a lower-bound counterpart to the tree-reweighted belief propagation algorithm, which uses a convex combination of spanning trees with positive weights to provide corresponding upper bounds. We develop algorithms to optimize and tighten the lower bounds over the non-convex set of valid parameter values. Our algorithm generalizes mean field approaches (including naïve and structured mean field approximations), which it includes as a limiting case.
Robust Metric Learning by Smooth Optimization
Huang, Kaizhu, Jin, Rong, Xu, Zenglin, Liu, Cheng-Lin
Most existing distance metric learning methods assume perfect side information that is usually given in pairwise or triplet constraints. Instead, in many real-world applications, the constraints are derived from side information, such as users' implicit feedbacks and citations among articles. As a result, these constraints are usually noisy and contain many mistakes. In this work, we aim to learn a distance metric from noisy constraints by robust optimization in a worst-case scenario, to which we refer as robust metric learning. We formulate the learning task initially as a combinatorial optimization problem, and show that it can be elegantly transformed to a convex programming problem. We present an efficient learning algorithm based on smooth optimization [7]. It has a worst-case convergence rate of O(1/{\surd}{\varepsilon}) for smooth optimization problems, where {\varepsilon} is the desired error of the approximate solution. Finally, our empirical study with UCI data sets demonstrate the effectiveness of the proposed method in comparison to state-of-the-art methods.