Goto

Collaborating Authors

 Optimization


Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data

arXiv.org Machine Learning

Sketching techniques have become popular for scaling up machine learning algorithms by reducing the sample size or dimensionality of massive data sets, while still maintaining the statistical power of big data. In this paper, we study sketching from an optimization point of view: we first show that the iterative Hessian sketch is an optimization process with preconditioning, and develop accelerated iterative Hessian sketch via the searching the conjugate direction; we then establish primal-dual connections between the Hessian sketch and dual random projection, and apply the preconditioned conjugate gradient approach on the dual problem, which leads to the accelerated iterative dual random projection methods. Finally to tackle the challenges from both large sample size and high-dimensionality, we propose the primal-dual sketch, which iteratively sketches the primal and dual formulations. We show that using a logarithmic number of calls to solvers of small scale problem, primal-dual sketch is able to recover the optimum of the original problem up to arbitrary precision. The proposed algorithms are validated via extensive experiments on synthetic and real data sets which complements our theoretical results.


Reports of the 2016 AAAI Workshop Program

AI Magazine

The Workshop Program of the Association for the Advancement of Artificial Intelligence’s Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) was held at the beginning of the conference, February 12-13, 2016. Workshop participants met and discussed issues with a selected focus — providing an informal setting for active exchange among researchers, developers and users on topics of current interest. To foster interaction and exchange of ideas, the workshops were kept small, with 25-65 participants. Attendance was sometimes limited to active participants only, but most workshops also allowed general registration by other interested individuals. The AAAI-16 Workshops were an excellent forum for exploring emerging approaches and task areas, for bridging the gaps between AI and other fields or between subfields of AI, for elucidating the results of exploratory research, or for critiquing existing approaches. The fifteen workshops held at AAAI-16 were Artificial Intelligence Applied to Assistive Technologies and Smart Environments (WS-16-01), AI, Ethics, and Society (WS-16-02), Artificial Intelligence for Cyber Security (WS-16-03), Artificial Intelligence for Smart Grids and Smart Buildings (WS-16-04), Beyond NP (WS-16-05), Computer Poker and Imperfect Information Games (WS-16-06), Declarative Learning Based Programming (WS-16-07), Expanding the Boundaries of Health Informatics Using AI (WS-16-08), Incentives and Trust in Electronic Communities (WS-16-09), Knowledge Extraction from Text (WS-16-10), Multiagent Interaction without Prior Coordination (WS-16-11), Planning for Hybrid Systems (WS-16-12), Scholarly Big Data: AI Perspectives, Challenges, and Ideas (WS-16-13), Symbiotic Cognitive Systems (WS-16-14), and World Wide Web and Population Health Intelligence (WS-16-15).


Distributed Convex Optimization with Many Convex Constraints

arXiv.org Machine Learning

An optimization problem can be called large scale if it involves a large number of optimization variables, and/or a large number of input parameters, and/or a large number of constraints. The increasing availability of distributed hardware suggests to address large scale optimization problems by distributed algorithms. Here we design and analyze a distributed algorithm for general convex optimization problems that involve a large number of constraints. The algorithm is a variant of the alternating direction method of multipliers (ADMM) that was proposed by Glowinski and Marroco [7] and by Gabay and Mercier [4]. Recently, ADMM regained significant attention, especially in the machine learning community, because it allows to solve convex optimization problems that involve a large number of parameters in a distributed setting [2]. In a typical machine learning method, like the least squares method for regression problems, the parameters are just the data points. Hence, ADMM is often the method of choice for Big Data problems, where the data does not fit into the memory of a single compute node. The optimization problem behind a typical Big Data machine learning method usually aims at minimizing a so called loss-function that is the sum of the losses for each data point. Hence, the objective function f of such problems is separable, i.e., it holds that f(x)


Pseudo-Bayesian Robust PCA: Algorithms and Analyses

arXiv.org Machine Learning

Commonly used in computer vision and other applications, robust PCA represents an algorithmic attempt to reduce the sensitivity of classical PCA to outliers. The basic idea is to learn a decomposition of some data matrix of interest into low rank and sparse components, the latter representing unwanted outliers. Although the resulting optimization problem is typically NP-hard, convex relaxations provide a computationally-expedient alternative with theoretical support. However, in practical regimes performance guarantees break down and a variety of non-convex alternatives, including Bayesian-inspired models, have been proposed to boost estimation quality. Unfortunately though, without additional a priori knowledge none of these methods can significantly expand the critical operational range such that exact principal subspace recovery is possible. Into this mix we propose a novel pseudo-Bayesian algorithm that explicitly compensates for design weaknesses in many existing non-convex approaches leading to state-of-the-art performance with a sound analytical foundation. Surprisingly, our algorithm can even outperform convex matrix completion despite the fact that the latter is provided with perfect knowledge of which entries are not corrupted.


The backtracking survey propagation algorithm for solving random K-SAT problems

arXiv.org Artificial Intelligence

Discrete combinatorial optimization plays a central role in many scientific disciplines, however for hard problems we lack linear time algorithms that would allow us to solve very large instances. Moreover it is still unclear what are the key features that make a discrete combinatorial optimization problem hard to solve. Here we study random K-satisfiability problems with K 3, 4 which are known to be very hard close to the SAT-UNSAT threshold, where problems stop having solutions. We show that the Backtracking Survey Propagation algorithm, in a time practically linear in the problem size, is able to find solutions very close to the threshold, in a region unreachable by any other algorithm. All solutions found have no frozen variables, thus supporting the conjecture that only unfrozen solutions can be found in linear time, and that a problem becomes impossibile to solve in linear time when all solutions contain frozen variables. Optimization problems with discrete variables are widespread among scientific disciplines and often among the hardest to solve.


Natural Gradients and Stochastic Variational Inference

#artificialintelligence

Examining the Fisher for diagonal Gaussians allows us to see exactly how the natural gradient differs from the standard gradient. If a dimension has small variance, then preconditioning by the inverse Fisher makes the natural gradient smaller along the mean dimension, . Intuitively, this makes sense -- we want our optimization routine to slow down when the component variance is small because small changes to the mean correspond to big changes in KL (see the two skinny Gaussians above), which can result in chatoic looking optimization traces. When the variance along a dimension is large, the inverse Fisher elongates the standard gradient along that dimension. Again, this makes sense -- when the component variance is high we can move the mean a lot farther (in Euclidean distance) without moving that far in terms of KL (see the two wide Gaussians above).


Funneled Bayesian Optimization for Design, Tuning and Control of Autonomous Systems

arXiv.org Machine Learning

Bayesian optimization has become a fundamental global optimization algorithm in many problems where sample efficiency is of paramount importance. Recently, there has been proposed a large number of new applications in fields such as robotics, machine learning, experimental design, simulation, etc. In this paper, we focus on several problems that appear in robotics and autonomous systems: algorithm tuning, automatic control and intelligent design. All those problems can be mapped to global optimization problems. However, they become hard optimization problems. Bayesian optimization internally uses a probabilistic surrogate model (e.g.: Gaussian process) to learn from the process and reduce the number of samples required. In order to generalize to unknown functions in a black-box fashion, the common assumption is that the underlying function can be modeled with a stationary process. Nonstationary Gaussian process regression cannot generalize easily and it typically requires prior knowledge of the function. Some works have designed techniques to generalize Bayesian optimization to nonstationary functions in an indirect way, but using techniques originally designed for regression, where the objective is to improve the quality of the surrogate model everywhere. Instead optimization should focus on improving the surrogate model near the optimum. In this paper, we present a novel kernel function specially designed for Bayesian optimization, that allows nonstationary behavior of the surrogate model in an adaptive local region. In our experiments, we found that this new kernel results in an improved local search (exploitation), without penalizing the global search (exploration). We provide results in well-known benchmarks and real applications. The new method outperforms the state of the art in Bayesian optimization both in stationary and nonstationary problems.


Data-Driven Learning of a Union of Sparsifying Transforms Model for Blind Compressed Sensing

arXiv.org Machine Learning

Compressed sensing is a powerful tool in applications such as magnetic resonance imaging (MRI). It enables accurate recovery of images from highly undersampled measurements by exploiting the sparsity of the images or image patches in a transform domain or dictionary. In this work, we focus on blind compressed sensing (BCS), where the underlying sparse signal model is a priori unknown, and propose a framework to simultaneously reconstruct the underlying image as well as the unknown model from highly undersampled measurements. Specifically, our model is that the patches of the underlying image(s) are approximately sparse in a transform domain. We also extend this model to a union of transforms model that better captures the diversity of features in natural images. The proposed block coordinate descent type algorithms for blind compressed sensing are highly efficient, and are guaranteed to converge to at least the partial global and partial local minimizers of the highly non-convex BCS problems. Our numerical experiments show that the proposed framework usually leads to better quality of image reconstructions in MRI compared to several recent image reconstruction methods. Importantly, the learning of a union of sparsifying transforms leads to better image reconstructions than a single adaptive transform.


Online Optimization with Costly and Noisy Measurements using Random Fourier Expansions

arXiv.org Machine Learning

This paper analyzes DONE, an online optimization algorithm that iteratively minimizes an unknown function based on costly and noisy measurements. The algorithm maintains a surrogate of the unknown function in the form of a random Fourier expansion (RFE). The surrogate is updated whenever a new measurement is available, and then used to determine the next measurement point. The algorithm is comparable to Bayesian optimization algorithms, but its computational complexity per iteration does not depend on the number of measurements. We derive several theoretical results that provide insight on how the hyper-parameters of the algorithm should be chosen. The algorithm is compared to a Bayesian optimization algorithm for a benchmark problem and three applications, namely, optical coherence tomography, optical beam-forming network tuning, and robot arm control. It is found that the DONE algorithm is significantly faster than Bayesian optimization in the discussed problems, while achieving a similar or better performance.


Predictive Coarse-Graining

arXiv.org Machine Learning

We propose a data-driven, coarse-graining formulation in the context of equilibrium statistical mechanics. In contrast to existing techniques which are based on a fine-to-coarse map, we adopt the opposite strategy by prescribing a probabilistic coarse-to-fine map. This corresponds to a directed probabilistic model where the coarse variables play the role of latent generators of the fine scale (all-atom) data. From an information-theoretic perspective, the framework proposed provides an improvement upon the relative entropy method and is capable of quantifying the uncertainty due to the information loss that unavoidably takes place during the CG process. Furthermore, it can be readily extended to a fully Bayesian model where various sources of uncertainties are reflected in the posterior of the model parameters. The latter can be used to produce not only point estimates of fine-scale reconstructions or macroscopic observables, but more importantly, predictive posterior distributions on these quantities. Predictive posterior distributions reflect the confidence of the model as a function of the amount of data and the level of coarse-graining. The issues of model complexity and model selection are seamlessly addressed by employing a hierarchical prior that favors the discovery of sparse solutions, revealing the most prominent features in the coarse-grained model. A flexible and parallelizable Monte Carlo - Expectation-Maximization (MC-EM) scheme is proposed for carrying out inference and learning tasks. A comparative assessment of the proposed methodology is presented for a lattice spin system and the SPC/E water model.