Goto

Collaborating Authors

 Optimization


Finding Dense Clusters via "Low Rank + Sparse" Decomposition

arXiv.org Machine Learning

Finding "densely connected clusters" in a graph is in general an important and well studied problem in the literature \cite{Schaeffer}. It has various applications in pattern recognition, social networking and data mining \cite{Duda,Mishra}. Recently, Ames and Vavasis have suggested a novel method for finding cliques in a graph by using convex optimization over the adjacency matrix of the graph \cite{Ames, Ames2}. Also, there has been recent advances in decomposing a given matrix into its "low rank" and "sparse" components \cite{Candes, Chandra}. In this paper, inspired by these results, we view "densely connected clusters" as imperfect cliques, where imperfections correspond missing edges, which are relatively sparse. We analyze the problem in a probabilistic setting and aim to detect disjointly planted clusters. Our main result basically suggests that, one can find \emph{dense} clusters in a graph, as long as the clusters are sufficiently large. We conclude by discussing possible extensions and future research directions.


Implementing regularization implicitly via approximate eigenvector computation

arXiv.org Machine Learning

Regularization is a powerful technique for extracting useful information from noisy data. Typically, it is implemented by adding some sort of norm constraint to an objective function and then exactly optimizing the modified objective function. This procedure often leads to optimization problems that are computationally more expensive than the original problem, a fact that is clearly problematic if one is interested in large-scale applications. On the other hand, a large body of empirical work has demonstrated that heuristics, and in some cases approximation algorithms, developed to speed up computations sometimes have the side-effect of performing regularization implicitly. Thus, we consider the question: What is the regularized optimization objective that an approximation algorithm is exactly optimizing? We address this question in the context of computing approximations to the smallest nontrivial eigenvector of a graph Laplacian; and we consider three random-walk-based procedures: one based on the heat kernel of the graph, one based on computing the the PageRank vector associated with the graph, and one based on a truncated lazy random walk. In each case, we provide a precise characterization of the manner in which the approximation method can be viewed as implicitly computing the exact solution to a regularized problem. Interestingly, the regularization is not on the usual vector form of the optimization problem, but instead it is on a related semidefinite program.


Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling

arXiv.org Machine Learning

The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. It arises in various application domains, including distributed tracking and localization, multi-agent co-ordination, estimation in sensor networks, and large-scale optimization in machine learning. We develop and analyze distributed algorithms based on dual averaging of subgradients, and we provide sharp bounds on their convergence rates as a function of the network size and topology. Our method of analysis allows for a clear separation between the convergence of the optimization algorithm itself and the effects of communication constraints arising from the network structure. In particular, we show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks. Our approach includes both the cases of deterministic optimization and communication, as well as problems with stochastic optimization and/or communication.


Efficient First Order Methods for Linear Composite Regularizers

arXiv.org Machine Learning

A wide class of regularization problems in machine learning and statistics employ a regularization term which is obtained by composing a simple convex function \omega with a linear transformation. This setting includes Group Lasso methods, the Fused Lasso and other total variation methods, multi-task learning methods and many more. In this paper, we present a general approach for computing the proximity operator of this class of regularizers, under the assumption that the proximity operator of the function \omega is known in advance. Our approach builds on a recent line of research on optimal first order optimization methods and uses fixed point iterations for numerically computing the proximity operator. It is more general than current approaches and, as we show with numerical simulations, computationally more efficient than available first order methods which do not achieve the optimal rate. In particular, our method outperforms state of the art O(1/T) methods for overlapping Group Lasso and matches optimal O(1/T^2) methods for the Fused Lasso and tree structured Group Lasso.


Planar Cycle Covering Graphs

arXiv.org Machine Learning

We describe a new variational lower-bound on the minimum energy configuration of a planar binary Markov Random Field (MRF). Our method is based on adding auxiliary nodes to every face of a planar embedding of the graph in order to capture the effect of unary potentials. A ground state of the resulting approximation can be computed efficiently by reduction to minimum-weight perfect matching. We show that optimization of variational parameters achieves the same lower-bound as dual-decomposition into the set of all cycles of the original graph. We demonstrate that our variational optimization converges quickly and provides high-quality solutions to hard combinatorial problems 10-100x faster than competing algorithms that optimize the same bound.


Regularizers for Structured Sparsity

arXiv.org Machine Learning

We study the problem of learning a sparse linear regression vector under additional conditions on the structure of its sparsity pattern. This problem is relevant in machine learning, statistics and signal processing. It is well known that a linear regression can benefit from knowledge that the underlying regression vector is sparse. The combinatorial problem of selecting the nonzero components of this vector can be "relaxed" by regularizing the squared error with a convex penalty function like the $\ell_1$ norm. However, in many applications, additional conditions on the structure of the regression vector and its sparsity pattern are available. Incorporating this information into the learning method may lead to a significant decrease of the estimation error. In this paper, we present a family of convex penalty functions, which encode prior knowledge on the structure of the vector formed by the absolute values of the regression coefficients. This family subsumes the $\ell_1$ norm and is flexible enough to include different models of sparsity patterns, which are of practical and theoretical importance. We establish the basic properties of these penalty functions and discuss some examples where they can be computed explicitly. Moreover, we present a convergent optimization algorithm for solving regularized least squares with these penalty functions. Numerical simulations highlight the benefit of structured sparsity and the advantage offered by our approach over the Lasso method and other related methods.


A Knowledge-Based Approach to Problem Formulation for Product Model-Based Multidisciplinary Design Optimization in AEC

AAAI Conferences

The cost-effectiveness and accuracy of a multidisciplinary design optimization (MDO) process is highly dependent on designersโ€™ ability to flexibly formulate the optimization problem for specific challenges. Designers need to rapidly modify how object parameters are assigned to groupings of objects in the product model. Our research has developed a Reference-Based Optimization Method (RBOM) to enable this type of flexible problem formulation. However, the responsibility still falls on the designer to manage the problem formulation and MDO process, which can lead to inefficient and costly design decisions. By means of artificial intelligence, in particular knowledge-based systems, these potential barriers to MDO adoption in the Architecture, Engineering, and Construction (AEC) industry could be mitigated, resulting in more efficient design processes and, ultimately, energy-efficient built environments.


C-HiLasso: A Collaborative Hierarchical Sparse Modeling Framework

arXiv.org Machine Learning

Sparse modeling is a powerful framework for data analysis and processing. Traditionally, encoding in this framework is performed by solving an L1-regularized linear regression problem, commonly referred to as Lasso or Basis Pursuit. In this work we combine the sparsity-inducing property of the Lasso model at the individual feature level, with the block-sparsity property of the Group Lasso model, where sparse groups of features are jointly encoded, obtaining a sparsity pattern hierarchically structured. This results in the Hierarchical Lasso (HiLasso), which shows important practical modeling advantages. We then extend this approach to the collaborative case, where a set of simultaneously coded signals share the same sparsity pattern at the higher (group) level, but not necessarily at the lower (inside the group) level, obtaining the collaborative HiLasso model (C-HiLasso). Such signals then share the same active groups, or classes, but not necessarily the same active set. This model is very well suited for applications such as source identification and separation. An efficient optimization procedure, which guarantees convergence to the global optimum, is developed for these new models. The underlying presentation of the new framework and optimization approach is complemented with experimental examples and theoretical results regarding recovery guarantees for the proposed models.


Estimation of low-rank tensors via convex optimization

arXiv.org Machine Learning

In this paper, we propose three approaches for the estimation of the Tucker decomposition of multi-way arrays (tensors) from partial observations. All approaches are formulated as convex minimization problems. Therefore, the minimum is guaranteed to be unique. The proposed approaches can automatically estimate the number of factors (rank) through the optimization. Thus, there is no need to specify the rank beforehand. The key technique we employ is the trace norm regularization, which is a popular approach for the estimation of low-rank matrices. In addition, we propose a simple heuristic to improve the interpretability of the obtained factorization. The advantages and disadvantages of three proposed approaches are demonstrated through numerical experiments on both synthetic and real world datasets. We show that the proposed convex optimization based approaches are more accurate in predictive performance, faster, and more reliable in recovering a known multilinear structure than conventional approaches.


Decision Making Agent Searching for Markov Models in Near-Deterministic World

arXiv.org Artificial Intelligence

Reinforcement learning has solid foundations, but becomes inefficient in partially observed (non-Markovian) environments. Thus, a learning agent -born with a representation and a policy- might wish to investigate to what extent the Markov property holds. We propose a learning architecture that utilizes combinatorial policy optimization to overcome non-Markovity and to develop efficient behaviors, which are easy to inherit, tests the Markov property of the behavioral states, and corrects against non-Markovity by running a deterministic factored Finite State Model, which can be learned. We illustrate the properties of architecture in the near deterministic Ms. Pac-Man game. We analyze the architecture from the point of view of evolutionary, individual, and social learning.