Goto

Collaborating Authors

 Search


Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models

Neural Information Processing Systems

Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with non-zero coefficients and reconstructing the target function from noisy observations. Two heuristics that are widely used in practice are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination that is based on the forward greedy algorithm but takes backward steps adaptively whenever beneficial.


Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework

Neural Information Processing Systems

Regularization technique has become a principle tool for statistics and machine learning research and practice. However, in most situations, these regularization terms are not well interpreted, especially on how they are related to the loss function and data. In this paper, we propose a robust minimax framework to interpret the relationship between data and regularization terms for a large class of loss functions. We show that various regularization terms are essentially corresponding to different distortions to the original data matrix. This minimax framework includes ridge regression, lasso, elastic net, fused lasso, group lasso, local coordinate coding, multiple kernel learning, etc., as special cases.


Bootstrapping from Game Tree Search

Neural Information Processing Systems

In this paper we introduce a new algorithm for updating the parameters of a heuristic evaluation function, by updating the heuristic towards the values computed by an alpha-beta search. Our algorithm differs from previous approaches to learning from search, such as Samuels checkers player and the TD-Leaf algorithm, in two key ways. First, we update all nodes in the search tree, rather than a single node. Second, we use the outcome of a deep search, instead of the outcome of a subsequent search, as the training signal for the evaluation function. We implemented our algorithm in a chess program Meep, using a linear heuristic function.


Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness

Neural Information Processing Systems

This paper uses information-theoretic techniques to determine minimax rates for estimating nonparametric sparse additive regression models under high-dimensional scaling. The first term reflects the difficulty of performing \emph{subset selection} and is independent of the Hilbert space $\Hilb$; the second term $\LowerRateSq$ is an \emph{\s-dimensional estimation} term, depending only on the low dimension $\s$ but not the ambient dimension $\pdim$, that captures the difficulty of estimating a sum of $\s$ univariate functions in the Hilbert space $\Hilb$. The minimax rates are compared with rates achieved by an $\ell_1$-penalty based approach, it can be shown that a certain $\ell_1$-based approach achieves the minimax optimal rate. Papers published at the Neural Information Processing Systems Conference.


Accelerating Bayesian Structural Inference for Non-Decomposable Gaussian Graphical Models

Neural Information Processing Systems

In this paper we make several contributions towards accelerating approximate Bayesian structural inference for non-decomposable GGMs. Our first contribution is to show how to efficiently compute a BIC or Laplace approximation to the marginal likelihood of non-decomposable graphs using convex methods for precision matrix estimation. This optimization technique can be used as a fast scoring function inside standard Stochastic Local Search (SLS) for generating posterior samples. Our second contribution is a novel framework for efficiently generating large sets of high-quality graph topologies without performing local search. This graph proposal method, which we call Neighborhood Fusion" (NF), samples candidate Markov blankets at each node using sparse regression techniques. Our final contribution is a hybrid method combining the complementary strengths of NF and SLS. Experimental results in structural recovery and prediction tasks demonstrate that NF and hybrid NF/SLS out-perform state-of-the-art local search methods, on both synthetic and real-world datasets, when realistic computational limits are imposed."


Nonparametric Greedy Algorithms for the Sparse Learning Problem

Neural Information Processing Systems

This paper studies the forward greedy strategy in sparse nonparametric regression. For additive models, we propose an algorithm called additive forward regression; for general multivariate regression, we propose an algorithm called generalized forward regression. Both of them simultaneously conduct estimation and variable selection in nonparametric settings for the high dimensional sparse learning problem. Our main emphasis is empirical: on both simulated and real data, these two simple greedy methods can clearly outperform several state-of-the-art competitors, including the LASSO, a nonparametric version of the LASSO called the sparse additive model (SpAM) and a recently proposed adaptive parametric forward-backward algorithm called the Foba. Some theoretical justifications are also provided.


A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes

Neural Information Processing Systems

Parametric policy search algorithms are one of the methods of choice for the optimisation of Markov Decision Processes, with Expectation Maximisation and natural gradient ascent being considered the current state of the art in the field. In this article we provide a unifying perspective of these two algorithms by showing that their step-directions in the parameter space are closely related to the search direction of an approximate Newton method. This analysis leads naturally to the consideration of this approximate Newton method as an alternative gradient-based method for Markov Decision Processes. We are able show that the algorithm has numerous desirable properties, absent in the naive application of Newton's method, that make it a viable alternative to either Expectation Maximisation or natural gradient ascent. Empirical results suggest that the algorithm has excellent convergence and robustness properties, performing strongly in comparison to both Expectation Maximisation and natural gradient ascent.


Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound

Neural Information Processing Systems

In this work we use Branch-and-Bound (BB) to efficiently detect objects with deformable part models. Instead of evaluating the classifier score exhaustively over image locations and scales, we use BB to focus on promising image locations. The core problem is to compute bounds that accommodate part deformations; for this we adapt the Dual Trees data structure to our problem. We evaluate our approach using Mixture-of-Deformable Part Models. We obtain exactly the same results but are 10-20 times faster on average.


Learning from the Wisdom of Crowds by Minimax Entropy

Neural Information Processing Systems

An important way to make large training sets is to gather noisy labels from crowds of nonexperts. We propose a minimax entropy principle to improve the quality of these labels. Our method assumes that labels are generated by a probability distribution over workers, items, and labels. We infer the ground truth by minimizing the entropy of this distribution, which we show minimizes the Kullback-Leibler (KL) divergence between the probability distribution and the unknown truth. We show that a simple coordinate descent scheme can optimize minimax entropy.


Selecting Diverse Features via Spectral Regularization

Neural Information Processing Systems

We study the problem of diverse feature selection in linear regression: selecting a small subset of diverse features that can predict a given objective. Diversity is useful for several reasons such as interpretability, robustness to noise, etc. We propose several spectral regularizers that capture a notion of diversity of features and show that these are all submodular set functions. These regularizers, when added to the objective function for linear regression, result in approximately submodular functions, which can then be maximized approximately by efficient greedy and local search algorithms, with provable guarantees. We compare our algorithms to traditional greedy and $\ell_1$-regularization schemes and show that we obtain a more diverse set of features that result in the regression problem being stable under perturbations.