Goto

Collaborating Authors

 Optimization


Scaling the Indian Buffet Process via Submodular Maximization

arXiv.org Machine Learning

Inference for latent feature models is inherently difficult as the inference space grows exponentially with the size of the input data and number of latent features. In this work, we use Kurihara & Welling (2008)'s maximization-expectation framework to perform approximate MAP inference for linear-Gaussian latent feature models with an Indian Buffet Process (IBP) prior. This formulation yields a submodular function of the features that corresponds to a lower bound on the model evidence. By adding a constant to this function, we obtain a nonnegative submodular function that can be maximized via a greedy algorithm that obtains at least a one-third approximation to the optimal solution. Our inference method scales linearly with the size of the input data, and we show the efficacy of our method on the largest datasets currently analyzed using an IBP model.


On the Computation of Fully Proportional Representation

Journal of Artificial Intelligence Research

We investigate two systems of fully proportional representation suggested by Chamberlin & Courant and Monroe. Both systems assign a representative to each voter so that the "sum of misrepresentations" is minimized. The winner determination problem for both systems is known to be NP-hard, hence this work aims at investigating whether there are variants of the proposed rules and/or specific electorates for which these problems can be solved efficiently. As a variation of these rules, instead of minimizing the sum of misrepresentations, we considered minimizing the maximal misrepresentation introducing effectively two new rules. In the general case these "minimax" versions of classical rules appeared to be still NP-hard. We investigated the parameterized complexity of winner determination of the two classical and two new rules with respect to several parameters. Here we have a mixture of positive and negative results: e.g., we proved fixed-parameter tractability for the parameter the number of candidates but fixed-parameter intractability for the number of winners. For single-peaked electorates our results are overwhelmingly positive: we provide polynomial-time algorithms for most of the considered problems. The only rule that remains NP-hard for single-peaked electorates is the classical Monroe rule.


How to minimize the energy consumption in mobile ad-hoc networks

arXiv.org Artificial Intelligence

In this work we are interested in the problem of energy management in Mobile Ad-hoc Network (MANET). The solving and optimization of MANET allow assisting the users to efficiently use their devices in order to minimize the batteries power consumption. In this framework, we propose a modelling of the MANET in form of a Constraint Optimization Problem called COMANET. Then, in the objective to minimize the consumption of batteries power, we present an approach based on an adaptation of the Dijkstra's algorithm to the MANET problem called MANED. Finally, we expose some experimental results showing utility of this approach.


Dictionary LASSO: Guaranteed Sparse Recovery under Linear Transformation

arXiv.org Machine Learning

We consider the following signal recovery problem: given a measurement matrix $\Phi\in \mathbb{R}^{n\times p}$ and a noisy observation vector $c\in \mathbb{R}^{n}$ constructed from $c = \Phi\theta^* + \epsilon$ where $\epsilon\in \mathbb{R}^{n}$ is the noise vector whose entries follow i.i.d. centered sub-Gaussian distribution, how to recover the signal $\theta^*$ if $D\theta^*$ is sparse {\rca under a linear transformation} $D\in\mathbb{R}^{m\times p}$? One natural method using convex optimization is to solve the following problem: $$\min_{\theta} {1\over 2}\|\Phi\theta - c\|^2 + \lambda\|D\theta\|_1.$$ This paper provides an upper bound of the estimate error and shows the consistency property of this method by assuming that the design matrix $\Phi$ is a Gaussian random matrix. Specifically, we show 1) in the noiseless case, if the condition number of $D$ is bounded and the measurement number $n\geq \Omega(s\log(p))$ where $s$ is the sparsity number, then the true solution can be recovered with high probability; and 2) in the noisy case, if the condition number of $D$ is bounded and the measurement increases faster than $s\log(p)$, that is, $s\log(p)=o(n)$, the estimate error converges to zero with probability 1 when $p$ and $s$ go to infinity. Our results are consistent with those for the special case $D=\bold{I}_{p\times p}$ (equivalently LASSO) and improve the existing analysis. The condition number of $D$ plays a critical role in our analysis. We consider the condition numbers in two cases including the fused LASSO and the random graph: the condition number in the fused LASSO case is bounded by a constant, while the condition number in the random graph case is bounded with high probability if $m\over p$ (i.e., $#text{edge}\over #text{vertex}$) is larger than a certain constant. Numerical simulations are consistent with our theoretical results.


Variational Algorithms for Marginal MAP

arXiv.org Artificial Intelligence

The marginal maximum a posteriori probability (MAP) estimation problem, which calculates the mode of the marginal posterior distribution of a subset of variables with the remaining variables marginalized, is an important inference problem in many models, such as those with hidden variables or uncertain parameters. Unfortunately, marginal MAP can be NP-hard even on trees, and has attracted less attention in the literature compared to the joint MAP (maximization) and marginalization problems. We derive a general dual representation for marginal MAP that naturally integrates the marginalization and maximization operations into a joint variational optimization problem, making it possible to easily extend most or all variational-based algorithms to marginal MAP. In particular, we derive a set of "mixed-product" message passing algorithms for marginal MAP, whose form is a hybrid of max-product, sum-product and a novel "argmax-product" message updates. We also derive a class of convergent algorithms based on proximal point methods, including one that transforms the marginal MAP problem into a sequence of standard marginalization problems. Theoretically, we provide guarantees under which our algorithms give globally or locally optimal solutions, and provide novel upper bounds on the optimal objectives. Empirically, we demonstrate that our algorithms significantly outperform the existing approaches, including a state-of-the-art algorithm based on local search methods.


A Lipschitz Exploration-Exploitation Scheme for Bayesian Optimization

arXiv.org Machine Learning

The problem of optimizing unknown costly-to-evaluate functions has been studied for a long time in the context of Bayesian Optimization. Algorithms in this field aim to find the optimizer of the function by asking only a few function evaluations at locations carefully selected based on a posterior model. In this paper, we assume the unknown function is Lipschitz continuous. Leveraging the Lipschitz property, we propose an algorithm with a distinct exploration phase followed by an exploitation phase. The exploration phase aims to select samples that shrink the search space as much as possible. The exploitation phase then focuses on the reduced search space and selects samples closest to the optimizer. Considering the Expected Improvement (EI) as a baseline, we empirically show that the proposed algorithm significantly outperforms EI.


Efficient Mixed-Norm Regularization: Algorithms and Safe Screening Methods

arXiv.org Machine Learning

Sparse learning has recently received increasing attention in many areas including machine learning, statistics, and applied mathematics. The mixed-norm regularization based on the l1q norm with q>1 is attractive in many applications of regression and classification in that it facilitates group sparsity in the model. The resulting optimization problem is, however, challenging to solve due to the inherent structure of the mixed-norm regularization. Existing work deals with special cases with q=1, 2, infinity, and they cannot be easily extended to the general case. In this paper, we propose an efficient algorithm based on the accelerated gradient method for solving the general l1q-regularized problem. One key building block of the proposed algorithm is the l1q-regularized Euclidean projection (EP_1q). Our theoretical analysis reveals the key properties of EP_1q and illustrates why EP_1q for the general q is significantly more challenging to solve than the special cases. Based on our theoretical analysis, we develop an efficient algorithm for EP_1q by solving two zero finding problems. To further improve the efficiency of solving large dimensional mixed-norm regularized problems, we propose a screening method which is able to quickly identify the inactive groups, i.e., groups that have 0 components in the solution. This may lead to substantial reduction in the number of groups to be entered to the optimization. An appealing feature of our screening method is that the data set needs to be scanned only once to run the screening. Compared to that of solving the mixed-norm regularized problems, the computational cost of our screening test is negligible. The key of the proposed screening method is an accurate sensitivity analysis of the dual optimal solution when the regularization parameter varies. Experimental results demonstrate the efficiency of the proposed algorithm.


Knowledge Matters: Importance of Prior Information for Optimization

arXiv.org Machine Learning

We explore the effect of introducing prior information into the intermediate level of neural networks for a learning task on which all the state-of-the-art machine learning algorithms tested failed to learn. We motivate our work from the hypothesis that humans learn such intermediate concepts from other individuals via a form of supervision or guidance using a curriculum. The experiments we have conducted provide positive evidence in favor of this hypothesis. In our experiments, a two-tiered MLP architecture is trained on a dataset with 64x64 binary inputs images, each image with three sprites. The final task is to decide whether all the sprites are the same or one of them is different. Sprites are pentomino tetris shapes and they are placed in an image with different locations using scaling and rotation transformations. The first part of the two-tiered MLP is pre-trained with intermediate-level targets being the presence of sprites at each location, while the second part takes the output of the first part as input and predicts the final task's target binary event. The two-tiered MLP architecture, with a few tens of thousand examples, was able to learn the task perfectly, whereas all other algorithms (include unsupervised pre-training, but also traditional algorithms like SVMs, decision trees and boosting) all perform no better than chance. We hypothesize that the optimization difficulty involved when the intermediate pre-training is not performed is due to the {\em composition} of two highly non-linear tasks. Our findings are also consistent with hypotheses on cultural learning inspired by the observations of optimization problems with deep learning, presumably because of effective local minima.


A Reliable Effective Terascale Linear Learning System

arXiv.org Machine Learning

We present a system and a set of techniques for learning linear predictors with convex losses on terascale datasets, with trillions of features, {The number of features here refers to the number of non-zero entries in the data matrix.} billions of training examples and millions of parameters in an hour using a cluster of 1000 machines. Individually none of the component techniques are new, but the careful synthesis required to obtain an efficient implementation is. The result is, up to our knowledge, the most scalable and efficient linear learning system reported in the literature (as of 2011 when our experiments were conducted). We describe and thoroughly evaluate the components of the system, showing the importance of the various design choices.


An Evolutionary Search Algorithm to Guide Stochastic Search for Near-Native Protein Conformations with Multiobjective Analysis

AAAI Conferences

Predicting native conformations of a protein sequence is known as de novo structure prediction and is a central challenge in computational biology. Most computational protocols employ Monte Carlo sampling. Evolutionary search algorithms have also been proposed to enhance sampling of near-native conformations. These approaches bias stochastic search by an energy function, even though current energy functions are known to be inaccurate and drive sampling to non-native energy minima. This paper proposes a multiobjective approach which employs Pareto dominance, rather than total energy, to evaluate a conformation. This multiobjective approach accounts for the fact that terms in an energy function are conflicting optimization criteria. Our analysis is conducted on a diverse set of 20 proteins. Results show that employing Pareto dominance, rather than total energy, to guide stochastic search is more effective at sampling conformations which are both lower in energy and near the protein native structure.