Optimization
Analog Sparse Approximation with Applications to Compressed Sensing
Charles, Adam S., Garrigues, Pierre, Rozell, Christopher J.
Recent research has shown that performance in signal processing tasks can often be significantly improved by using signal models based on sparse representations, where a signal is approximated using a small number of elements from a fixed dictionary. Unfortunately, inference in this model involves solving non-smooth optimization problems that are computationally expensive. While significant efforts have focused on developing digital algorithms specifically for this problem, these algorithms are inappropriate for many applications because of the time and power requirements necessary to solve large optimization problems. Based on recent work in computational neuroscience, we explore the potential advantages of continuous time dynamical systems for solving sparse approximation problems if they were implemented in analog VLSI. Specifically, in the simulated task of recovering synthetic and MRI data acquired via compressive sensing techniques, we show that these systems can potentially perform recovery at time scales of 10-20{\mu}s, supporting datarates of 50-100 kHz (orders of magnitude faster that digital algorithms). Furthermore, we show analytically that a wide range of sparse approximation problems can be solved in the same basic architecture, including approximate $\ell^p$ norms, modified $\ell^1$ norms, re-weighted $\ell^1$ and $\ell^2$, the block $\ell^1$ norm and classic Tikhonov regularization.
Fast Learning Rate of Non-Sparse Multiple Kernel Learning and Optimal Regularization Strategies
In this paper, we give a new generalization error bound of Multiple Kernel Learning (MKL) for a general class of regularizations, and discuss what kind of regularization gives a favorable predictive accuracy. Our main target in this paper is dense type regularizations including \ellp-MKL. According to the recent numerical experiments, the sparse regularization does not necessarily show a good performance compared with dense type regularizations. Motivated by this fact, this paper gives a general theoretical tool to derive fast learning rates of MKL that is applicable to arbitrary mixed-norm-type regularizations in a unifying manner. This enables us to compare the generalization performances of various types of regularizations. As a consequence, we observe that the homogeneity of the complexities of candidate reproducing kernel Hilbert spaces (RKHSs) affects which regularization strategy (\ell1 or dense) is preferred. In fact, in homogeneous complexity settings where the complexities of all RKHSs are evenly same, \ell1-regularization is optimal among all isotropic norms. On the other hand, in inhomogeneous complexity settings, dense type regularizations can show better learning rate than sparse \ell1-regularization. We also show that our learning rate achieves the minimax lower bound in homogeneous complexity settings.
An Iterative Algorithm for Fitting Nonconvex Penalized Generalized Linear Models with Grouped Predictors
High-dimensional data pose challenges in statistical learning and modeling. Sometimes the predictors can be naturally grouped where pursuing the between-group sparsity is desired. Collinearity may occur in real-world high-dimensional applications where the popular $l_1$ technique suffers from both selection inconsistency and prediction inaccuracy. Moreover, the problems of interest often go beyond Gaussian models. To meet these challenges, nonconvex penalized generalized linear models with grouped predictors are investigated and a simple-to-implement algorithm is proposed for computation. A rigorous theoretical result guarantees its convergence and provides tight preliminary scaling. This framework allows for grouped predictors and nonconvex penalties, including the discrete $l_0$ and the `$l_0+l_2$' type penalties. Penalty design and parameter tuning for nonconvex penalties are examined. Applications of super-resolution spectrum estimation in signal processing and cancer classification with joint gene selection in bioinformatics show the performance improvement by nonconvex penalized estimation.
Efficient Pseudo-Boolean Satisfiability Encodings for Routing and Wavelength Assignment in Optical Networks
Velev, Miroslav N. (Aries Design Automation) | Gao, Ping (Aries Design Automation)
We propose a novel method for combined Routing and Wave-length Assignment (RWA) in optical networks by reformulation to an equivalent Pseudo-Boolean Satisfiability (PB-SAT) problem. We introduce edge observability variables to represent whether an edge is on the optimal route, combined with either a simple or a hierarchical SAT encoding to select a wavelength for that edge only when the edge is on the route. This translation exponentially reduces the size of the solution space, making it independent of the number of wavelengths per link. We present experimental results for routing instances with up to 3,000 nodes, 15,000 edges, and 2,048 wavelengths per edge, and achieve at least 8 orders of magnitude speedup relative to a previous PB-SAT encoding by Aloul et al., such that the speedup is increasing with the number of nodes and edges in the network, and the number of wavelengths per edge. A portfolio of 4 parallel strategies, each based on the new approach and a different hierarchical encoding, resulted in additional speedup of up to 6 times, and reduced the variability of the run times for large networks.
Satisfiability Modulo Theories: An Efficient Approach for the Resource-Constrained Project Scheduling Problem
Ansรณtegui, Carlos (Universitat de Lleida) | Bofill, Miquel (Universitat de Girona) | Palahรญ, Miquel (Universitat de Girona) | Suy, Josep (Universitat de Girona) | Villaret, Mateu (Universitat de Girona)
The Resource-Constrained Project Scheduling Problem (RCPSP) and some of its extensions have been widely studied. Many approaches have been considered to solve this problem: constraint programming (CP), Boolean satisfiability (SAT), mixed integer linear programming (MILP), branch and bound algorithms (BB) and others. In this paper, we present a new approach for solving this problem: satisfiability modulo theories (SMT). Solvers for SMT generalize SAT solving by adding the ability to handle arithmetic and other theories. We provide several encodings of the RCPSP into SMT, and introduce rcp2smt, a tool for solving RCPSP instances using SMT solvers, which exhibits good performance.
Bayesian Optimization for Adaptive MCMC
Mahendran, Nimalan, Wang, Ziyu, Hamze, Firas, de Freitas, Nando
This paper proposes a new randomized strategy for adaptive MCMC using Bayesian optimization. This approach applies to non-differentiable objective functions and trades off exploration and exploitation to reduce the number of potentially costly objective function evaluations. We demonstrate the strategy in the complex setting of sampling from constrained, discrete and densely connected probabilistic graphical models where, for each variation of the problem, one needs to adjust the parameters of the proposal mechanism automatically to ensure efficient mixing of the Markov chains.
Efficient Marginal Likelihood Computation for Gaussian Process Regression
Schirru, Andrea, Pampuri, Simone, De Nicolao, Giuseppe, McLoone, Sean
In a Bayesian learning setting, the posterior distribution of a predictive model arises from a trade-off between its prior distribution and the conditional likelihood of observed data. Such distribution functions usually rely on additional hyperparameters which need to be tuned in order to achieve optimum predictive performance; this operation can be efficiently performed in an Empirical Bayes fashion by maximizing the posterior marginal likelihood of the observed data. Since the score function of this optimization problem is in general characterized by the presence of local optima, it is necessary to resort to global optimization strategies, which require a large number of function evaluations. Given that the evaluation is usually computationally intensive and badly scaled with respect to the dataset size, the maximum number of observations that can be treated simultaneously is quite limited. In this paper, we consider the case of hyperparameter tuning in Gaussian process regression. A straightforward implementation of the posterior log-likelihood for this model requires O(N^3) operations for every iteration of the optimization procedure, where N is the number of examples in the input dataset. We derive a novel set of identities that allow, after an initial overhead of O(N^3), the evaluation of the score function, as well as the Jacobian and Hessian matrices, in O(N) operations. We prove how the proposed identities, that follow from the eigendecomposition of the kernel matrix, yield a reduction of several orders of magnitude in the computation time for the hyperparameter optimization problem. Notably, the proposed solution provides computational advantages even with respect to state of the art approximations that rely on sparse kernel matrices.
Ordinal Risk-Group Classification
Most classification methods provide either a prediction of class membership or an assessment of class membership probability. In the case of two-group classification the predicted probability can be described as "risk" of belonging to a "special" class . When the required output is a set of ordinal-risk groups, a discretization of the continuous risk prediction is achieved by two common methods: by constructing a set of models that describe the conditional risk function at specific points (quantile regression) or by dividing the output of an "optimal" classification model into adjacent intervals that correspond to the desired risk groups. By defining a new error measure for the distribution of risk onto intervals we are able to identify lower bounds on the accuracy of these methods, showing sub-optimality both in their distribution of risk and in the efficiency of their resulting partition into intervals. By adding a new form of constraint to the existing maximum likelihood optimization framework and by introducing a penalty function to avoid degenerate solutions, we show how existing methods can be augmented to solve the ordinal risk-group classification problem. We implement our method for logistic regression (LR) and show a numeric example.
A Flexible, Scalable and Efficient Algorithmic Framework for Primal Graphical Lasso
Mazumder, Rahul, Agarwal, Deepak K.
We propose a scalable, efficient and statistically motivated computational framework for Graphical Lasso (Friedman et al., 2007b) - a covariance regularization framework that has received significant attention in the statistics community over the past few years. Existing algorithms have trouble in scaling to dimensions larger than a thousand. Our proposal significantly enhances the state-of-the-art for such moderate sized problems and gracefully scales to larger problems where other algorithms become practically infeasible. This requires a few key new ideas. We operate on the primal problem and use a subtle variation of block-coordinate-methods which drastically reduces the computational complexity by orders of magnitude. We provide rigorous theoretical guarantees on the convergence and complexity of our algorithm and demonstrate the effectiveness of our proposal via experiments. We believe that our framework extends the applicability of Graphical Lasso to large-scale modern applications like bioinformatics, collaborative filtering and social networks, among others.
Inferring Networks of Diffusion and Influence
Gomez-Rodriguez, Manuel, Leskovec, Jure, Krause, Andreas
Information diffusion and virus propagation are fundamental processes taking place in networks. While it is often possible to directly observe when nodes become infected with a virus or adopt the information, observing individual transmissions (i.e., who infects whom, or who influences whom) is typically very difficult. Furthermore, in many applications, the underlying network over which the diffusions and propagations spread is actually unobserved. We tackle these challenges by developing a method for tracing paths of diffusion and influence through networks and inferring the networks over which contagions propagate. Given the times when nodes adopt pieces of information or become infected, we identify the optimal network that best explains the observed infection times. Since the optimization problem is NP-hard to solve exactly, we develop an efficient approximation algorithm that scales to large datasets and finds provably near-optimal networks. We demonstrate the effectiveness of our approach by tracing information diffusion in a set of 170 million blogs and news articles over a one year period to infer how information flows through the online media space. We find that the diffusion network of news for the top 1,000 media sites and blogs tends to have a core-periphery structure with a small set of core media sites that diffuse information to the rest of the Web. These sites tend to have stable circles of influence with more general news media sites acting as connectors between them.