Goto

Collaborating Authors

 Genre


Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation

arXiv.org Machine Learning

The goal of reinforcement learning (RL) is to let an agent learn an optimal control policy in an unknown environment so that future expected rewards are maximized. The model-free RL approach directly learns the policy based on data samples. Although using many samples tends to improve the accuracy of policy learning, collecting a large number of samples is often expensive in practice. On the other hand, the model-based RL approach first estimates the transition model of the environment and then learns the policy based on the estimated transition model. Thus, if the transition model is accurately learned from a small amount of data, the model-based approach can perform better than the model-free approach. In this paper, we propose a novel model-based RL method by combining a recently proposed model-free policy search method called policy gradients with parameter-based exploration and the state-of-the-art transition model estimator called least-squares conditional density estimation. Through experiments, we demonstrate the practical usefulness of the proposed method.


A Safe Screening Rule for Sparse Logistic Regression

arXiv.org Machine Learning

The l1-regularized logistic regression (or sparse logistic regression) is a widely used method for simultaneous classification and feature selection. Although many recent efforts have been devoted to its efficient implementation, its application to high dimensional data still poses significant challenges. In this paper, we present a fast and effective sparse logistic regression screening rule (Slores) to identify the 0 components in the solution vector, which may lead to a substantial reduction in the number of features to be entered to the optimization. An appealing feature of Slores is that the data set needs to be scanned only once to run the screening and its computational cost is negligible compared to that of solving the sparse logistic regression problem. Moreover, Slores is independent of solvers for sparse logistic regression, thus Slores can be integrated with any existing solver to improve the efficiency. We have evaluated Slores using high-dimensional data sets from different applications. Extensive experimental results demonstrate that Slores outperforms the existing state-of-the-art screening rules and the efficiency of solving sparse logistic regression is improved by one magnitude in general.


Noisy Subspace Clustering via Thresholding

arXiv.org Machine Learning

We consider the problem of clustering noisy high-dimensional data points into a union of low-dimensional subspaces and a set of outliers. The number of subspaces, their dimensions, and their orientations are unknown. A probabilistic performance analysis of the thresholding-based subspace clustering (TSC) algorithm introduced recently in [1] shows that TSC succeeds in the noisy case, even when the subspaces intersect. Our results reveal an explicit tradeoff between the allowed noise level and the affinity of the subspaces. We furthermore find that the simple outlier detection scheme introduced in [1] provably succeeds in the noisy case.


Nonlinear unmixing of hyperspectral images: models and algorithms

arXiv.org Machine Learning

When considering the problem of unmixing hyperspectral images, most of the literature in the geoscience and image processing areas relies on the widely used linear mixing model (LMM). However, the LMM may be not valid and other nonlinear models need to be considered, for instance, when there are multi-scattering effects or intimate interactions. Consequently, over the last few years, several significant contributions have been proposed to overcome the limitations inherent in the LMM. In this paper, we present an overview of recent advances in nonlinear unmixing modeling.


Variational Algorithms for Marginal MAP

arXiv.org Artificial Intelligence

The marginal maximum a posteriori probability (MAP) estimation problem, which calculates the mode of the marginal posterior distribution of a subset of variables with the remaining variables marginalized, is an important inference problem in many models, such as those with hidden variables or uncertain parameters. Unfortunately, marginal MAP can be NP-hard even on trees, and has attracted less attention in the literature compared to the joint MAP (maximization) and marginalization problems. We derive a general dual representation for marginal MAP that naturally integrates the marginalization and maximization operations into a joint variational optimization problem, making it possible to easily extend most or all variational-based algorithms to marginal MAP. In particular, we derive a set of "mixed-product" message passing algorithms for marginal MAP, whose form is a hybrid of max-product, sum-product and a novel "argmax-product" message updates. We also derive a class of convergent algorithms based on proximal point methods, including one that transforms the marginal MAP problem into a sequence of standard marginalization problems. Theoretically, we provide guarantees under which our algorithms give globally or locally optimal solutions, and provide novel upper bounds on the optimal objectives. Empirically, we demonstrate that our algorithms significantly outperform the existing approaches, including a state-of-the-art algorithm based on local search methods.


A New Convex Relaxation for Tensor Completion

arXiv.org Machine Learning

We study the problem of learning a tensor from a set of linear measurements. A prominent methodology for this problem is based on a generalization of trace norm regularization, which has been used extensively for learning low rank matrices, to the tensor setting. In this paper, we highlight some limitations of this approach and propose an alternative convex relaxation on the Euclidean ball. We then describe a technique to solve the associated regularization problem, which builds upon the alternating direction method of multipliers. Experiments on one synthetic dataset and two real datasets indicate that the proposed method improves significantly over tensor trace norm regularization in terms of estimation error, while remaining computationally tractable.


From Bandits to Experts: A Tale of Domination and Independence

arXiv.org Machine Learning

We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir. Our main result is a characterization of regret in the directed observability model in terms of the dominating and independence numbers of the observability graph. We also show that in the undirected case, the learner can achieve optimal regret without even accessing the observability graph before selecting an action. Both results are shown using variants of the Exp3 algorithm operating on the observability graph in a time-efficient manner.


Universally Elevating the Phase Transition Performance of Compressed Sensing: Non-Isometric Matrices are Not Necessarily Bad Matrices

arXiv.org Machine Learning

In compressed sensing problems, $\ell_1$ minimization or Basis Pursuit was known to have the best provable phase transition performance of recoverable sparsity among polynomial-time algorithms. It is of great theoretical and practical interest to find alternative polynomial-time algorithms which perform better than $\ell_1$ minimization. \cite{Icassp reweighted l_1}, \cite{Isit reweighted l_1}, \cite{XuScaingLaw} and \cite{iterativereweightedjournal} have shown that a two-stage re-weighted $\ell_1$ minimization algorithm can boost the phase transition performance for signals whose nonzero elements follow an amplitude probability density function (pdf) $f(\cdot)$ whose $t$-th derivative $f^{t}(0) \neq 0$ for some integer $t \geq 0$. However, for signals whose nonzero elements are strictly suspended from zero in distribution (for example, constant-modulus, only taking values `$+d$' or `$-d$' for some nonzero real number $d$), no polynomial-time signal recovery algorithms were known to provide better phase transition performance than plain $\ell_1$ minimization, especially for dense sensing matrices. In this paper, we show that a polynomial-time algorithm can universally elevate the phase-transition performance of compressed sensing, compared with $\ell_1$ minimization, even for signals with constant-modulus nonzero elements. Contrary to conventional wisdoms that compressed sensing matrices are desired to be isometric, we show that non-isometric matrices are not necessarily bad sensing matrices. In this paper, we also provide a framework for recovering sparse signals when sensing matrices are not isometric.


Regret Bounds for Reinforcement Learning with Policy Advice

arXiv.org Machine Learning

In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors. We present a reinforcement learning with policy advice (RLPA) algorithm which leverages this input set and learns to use the best policy in the set for the reinforcement learning task at hand. We prove that RLPA has a sub-linear regret of \tilde O(\sqrt{T}) relative to the best input policy, and that both this regret and its computational complexity are independent of the size of the state and action space. Our empirical simulations support our theoretical analysis. This suggests RLPA may offer significant advantages in large domains where some prior good policies are provided.


A Lipschitz Exploration-Exploitation Scheme for Bayesian Optimization

arXiv.org Machine Learning

The problem of optimizing unknown costly-to-evaluate functions has been studied for a long time in the context of Bayesian Optimization. Algorithms in this field aim to find the optimizer of the function by asking only a few function evaluations at locations carefully selected based on a posterior model. In this paper, we assume the unknown function is Lipschitz continuous. Leveraging the Lipschitz property, we propose an algorithm with a distinct exploration phase followed by an exploitation phase. The exploration phase aims to select samples that shrink the search space as much as possible. The exploitation phase then focuses on the reduced search space and selects samples closest to the optimizer. Considering the Expected Improvement (EI) as a baseline, we empirically show that the proposed algorithm significantly outperforms EI.