Mathematical & Statistical Methods
Recursive Compressed Sensing
Freris, Nikolaos M., รรงal, Orhan, Vetterli, Martin
We introduce a recursive algorithm for performing compressed sensing on streaming data. The approach consists of a) recursive encoding, where we sample the input stream via overlapping windowing and make use of the previous measurement in obtaining the next one, and b) recursive decoding, where the signal estimate from the previous window is utilized in order to achieve faster convergence in an iterative optimization scheme applied to decode the new one. To remove estimation bias, a two-step estimation procedure is proposed comprising support set detection and signal amplitude estimation. Estimation accuracy is enhanced by a non-linear voting method and averaging estimates over multiple windows. We analyze the computational complexity and estimation error, and show that the normalized error variance asymptotically goes to zero for sublinear sparsity. Our simulation results show speed up of an order of magnitude over traditional CS, while obtaining significantly lower reconstruction error under mild conditions on the signal magnitudes and the noise level.
Algorithm Runtime Prediction: Methods & Evaluation
Hutter, Frank, Xu, Lin, Hoos, Holger H., Leyton-Brown, Kevin
Perhaps surprisingly, it is possible to predict how long an algorithm will take to run on a previously unseen input, using machine learning techniques to build a model of the algorithm's runtime as a function of problem-specific instance features. Such models have important applications to algorithm analysis, portfolio-based algorithm selection, and the automatic configuration of parameterized algorithms. Over the past decade, a wide variety of techniques have been studied for building such models. Here, we describe extensions and improvements of existing models, new families of models, and -- perhaps most importantly -- a much more thorough treatment of algorithm parameters as model inputs. We also comprehensively describe new and existing features for predicting algorithm runtime for propositional satisfiability (SAT), travelling salesperson (TSP) and mixed integer programming (MIP) problems. We evaluate these innovations through the largest empirical analysis of its kind, comparing to a wide range of runtime modelling techniques from the literature. Our experiments consider 11 algorithms and 35 instance distributions; they also span a very wide range of SAT, MIP, and TSP instances, with the least structured having been generated uniformly at random and the most structured having emerged from real industrial applications. Overall, we demonstrate that our new models yield substantially better runtime predictions than previous approaches in terms of their generalization to new problem instances, to new algorithms from a parameterized space, and to both simultaneously.
Spectral Clustering with Epidemic Diffusion
Smith, Laura M., Lerman, Kristina, Garcia-Cardona, Cristina, Percus, Allon G., Ghosh, Rumi
Spectral clustering is widely used to partition graphs into distinct modules or communities. Existing methods for spectral clustering use the eigenvalues and eigenvectors of the graph Laplacian, an operator that is closely associated with random walks on graphs. We propose a new spectral partitioning method that exploits the properties of epidemic diffusion. An epidemic is a dynamic process that, unlike the random walk, simultaneously transitions to all the neighbors of a given node. We show that the replicator, an operator describing epidemic diffusion, is equivalent to the symmetric normalized Laplacian of a reweighted graph with edges reweighted by the eigenvector centralities of their incident nodes. Thus, more weight is given to edges connecting more central nodes. We describe a method that partitions the nodes based on the componentwise ratio of the replicator's second eigenvector to the first, and compare its performance to traditional spectral clustering techniques on synthetic graphs with known community structure. We demonstrate that the replicator gives preference to dense, clique-like structures, enabling it to more effectively discover communities that may be obscured by dense intercommunity linking.
Random walk kernels and learning curves for Gaussian process regression on random graphs
We consider learning on graphs, guided by kernels that encode similarity between vertices. Our focus is on random walk kernels, the analogues of squared exponential kernels in Euclidean spaces. We show that on large, locally treelike, graphs these have some counter-intuitive properties, specifically in the limit of large kernel lengthscales. We consider using these kernels as covariance matrices of e.g.\ Gaussian processes (GPs). In this situation one typically scales the prior globally to normalise the average of the prior variance across vertices. We demonstrate that, in contrast to the Euclidean case, this generically leads to significant variation in the prior variance across vertices, which is undesirable from the probabilistic modelling point of view. We suggest the random walk kernel should be normalised locally, so that each vertex has the same prior variance, and analyse the consequences of this by studying learning curves for Gaussian process regression. Numerical calculations as well as novel theoretical predictions for the learning curves using belief propagation make it clear that one obtains distinctly different probabilistic models depending on the choice of normalisation. Our method for predicting the learning curves using belief propagation is significantly more accurate than previous approximations and should become exact in the limit of large random graphs.
Extended Distributed Learning Automata:A New Method for Solving Stochastic Graph Optimization Problems
Meybodi, M. R. Mollakhalili, Meybodi, M. R.
In this paper, a new structure of cooperative learning automata so-called extended learning automata (eDLA) is introduced. Based on the proposed structure, a new iterative randomized heuristic algorithm for finding optimal sub-graph in a stochastic edge-weighted graph through sampling is proposed. It has been shown that the proposed algorithm based on new networked-structure can be to solve the optimization problems on stochastic graph through less number of sampling in compare to standard sampling. Stochastic graphs are graphs in which the edges have an unknown distribution probability weights. Proposed algorithm uses an eDLA to find a policy that leads to an induced sub-graph that satisfies some restrictions such as minimum or maximum weight (length). At each stage of the proposed algorithm, eDLA determines which edges to be sampled. This eDLA-based proposed sampling method may result in decreasing unnecessary samples and hence decreasing the time that algorithm requires for finding the optimal sub-graph. It has been shown that proposed method converge to optimal solution, furthermore the probability of this convergence can be made arbitrarily close to 1 by using a sufficiently small learning rate. A new variance-aware threshold value was proposed that can be improving significantly convergence rate of the proposed eDLA-based algorithm. It has been shown that the proposed algorithm is competitive in terms of the quality of the solution
Improved Integer Programming Approaches for Chance-Constrained Stochastic Programming
Yanagisawa, Hiroki (IBM Japan) | Osogami, Takayuki (IBM Japan)
The Chance-Constrained Stochastic Programming (CCSP) is one of the models for decision making under uncertainty.ย In this paper, we consider the special case of the CCSP in which only the right-hand side vector is random with a discrete distribution having a finite support.ย The unit commitment problem is one of the applications of the special case of the CCSP.ย Existing methods for exactly solving the CCSP problems require an enumeration of scenarios when they model a CCSP problem using a Mixed Integer Programming (MIP).ย We show how to reduce the number of scenarios enumerated in the MIP model.ย In addition, we give another compact MIP formulation to approximately solve the CCSP problems.
Bayesian Optimization in High Dimensions via Random Embeddings
Wang, Ziyu (University of British Columbia) | Zoghi, Masrour (University of Amsterdam) | Hutter, Frank (Freiburg University) | Matheson, David (Universityย of British Columbia) | Freitas, Nando de (Universityย of British Columbia)
Bayesian optimization techniques have been successfully applied to robotics, planning, sensor placement, recommendation, advertising, intelligent user interfaces and automatic algorithm configuration. Despite these successes, the approach is restricted to problems of moderate dimension, and several workshops on Bayesian optimization have identified its scaling to high dimensions as one of the holy grails of the field. In this paper, we introduce a novel random embedding idea to attack this problem. The resulting Random EMbedding Bayesian Optimization (REMBO) algorithm is very simple and applies to domains with both categorical and continuous variables. The experiments demonstrate that REMBO can effectively solve high-dimensional problems, including automatic parameter configuration of a popular mixed integer linear programming solver.
Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances
Optimal transportation distances are a fundamental family of parameterized distances for histograms. Despite their appealing theoretical properties, excellent performance in retrieval tasks and intuitive formulation, their computation involves the resolution of a linear program whose cost is prohibitive whenever the histograms' dimension exceeds a few hundreds. We propose in this work a new family of optimal transportation distances that look at transportation problems from a maximum-entropy perspective. We smooth the classical optimal transportation problem with an entropic regularization term, and show that the resulting optimum is also a distance which can be computed through Sinkhorn-Knopp's matrix scaling algorithm at a speed that is several orders of magnitude faster than that of transportation solvers. We also report improved performance over classical optimal transportation distances on the MNIST benchmark problem.
Structural and Functional Discovery in Dynamic Networks with Non-negative Matrix Factorization
Mankad, Shawn, Michailidis, George
Due to advances in data collection technologies, it is becoming increasingly common to study time series of networks. An important research question is how to discover the underlying structure and dynamics in time-varying networked systems. In this work, we propose a new matrix factorization-based approach for community discovery and visual exploration within potentially weighted and directed network time-series. Next, we review and discuss this work in relation to popular approaches for addressing the key problems of community detection and visualization of time series of networks. There have been many important contributions for community detection in network time-series, extensively reviewed in [1, 2], from the fields of physics, computer science and statistics. The basic goal of community detection is to extract groups of nodes that feature relatively dense within group connectivity and sparser between group connections [3, 4]. A common strategy is to embed the graphs in low-dimensional latent spaces. For instance, [5] use latent variables to capture groups of papers that evolve similarly in citation network data.
Factoring nonnegative matrices with linear programs
Bittorf, Victor, Recht, Benjamin, Re, Christopher, Tropp, Joel A.
This paper describes a new approach, based on linear programming, for computing nonnegative matrix factorizations (NMFs). The key idea is a data-driven model for the factorization where the most salient features in the data are used to express the remaining features. More precisely, given a data matrix X, the algorithm identifies a matrix C such that X approximately equals CX and some linear constraints. The constraints are chosen to ensure that the matrix C selects features; these features can then be used to find a low-rank NMF of X. A theoretical analysis demonstrates that this approach has guarantees similar to those of the recent NMF algorithm of Arora et al. (2012). In contrast with this earlier work, the proposed method extends to more general noise models and leads to efficient, scalable algorithms. Experiments with synthetic and real datasets provide evidence that the new approach is also superior in practice. An optimized C++ implementation can factor a multigigabyte matrix in a matter of minutes.