Goto

Collaborating Authors

 Undirected Networks


Asymptotically Optimal Sequential Experimentation Under Generalized Ranking

arXiv.org Machine Learning

We consider the \mnk{classical} problem of a controller activating (or sampling) sequentially from a finite number of $N \geq 2$ populations, specified by unknown distributions. Over some time horizon, at each time $n = 1, 2, \ldots$, the controller wishes to select a population to sample, with the goal of sampling from a population that optimizes some "score" function of its distribution, e.g., maximizing the expected sum of outcomes or minimizing variability. We define a class of \textit{Uniformly Fast (UF)} sampling policies and show, under mild regularity conditions, that there is an asymptotic lower bound for the expected total number of sub-optimal population activations. Then, we provide sufficient conditions under which a UCB policy is UF and asymptotically optimal, since it attains this lower bound. Explicit solutions are provided for a number of examples of interest, including general score functionals on unconstrained Pareto distributions (of potentially infinite mean), and uniform distributions of unknown support. Additional results on bandits of Normal distributions are also provided.


Asymptotically Optimal Multi-Armed Bandit Policies under a Cost Constraint

arXiv.org Machine Learning

We develop asymptotically optimal policies for the multi armed bandit (MAB), problem, under a cost constraint. This model is applicable in situations where each sample (or activation) from a population (bandit) incurs a known bandit dependent cost. Successive samples from each population are iid random variables with unknown distribution. The objective is to design a feasible policy for deciding from which population to sample from, so as to maximize the expected sum of outcomes of $n$ total samples or equivalently to minimize the regret due to lack on information on sample distributions, For this problem we consider the class of feasible uniformly fast (f-UF) convergent policies, that satisfy the cost constraint sample-path wise. We first establish a necessary asymptotic lower bound for the rate of increase of the regret function of f-UF policies. Then we construct a class of f-UF policies and provide conditions under which they are asymptotically optimal within the class of f-UF policies, achieving this asymptotic lower bound. At the end we provide the explicit form of such policies for the case in which the unknown distributions are Normal with unknown means and known variances.


Learning a Hybrid Architecture for Sequence Regression and Annotation

arXiv.org Machine Learning

When learning a hidden Markov model (HMM), sequen- tial observations can often be complemented by real-valued summary response variables generated from the path of hid- den states. Such settings arise in numerous domains, includ- ing many applications in biology, like motif discovery and genome annotation. In this paper, we present a flexible frame- work for jointly modeling both latent sequence features and the functional mapping that relates the summary response variables to the hidden state sequence. The algorithm is com- patible with a rich set of mapping functions. Results show that the availability of additional continuous response vari- ables can simultaneously improve the annotation of the se- quential observations and yield good prediction performance in both synthetic data and real-world datasets.


On Russian Roulette Estimates for Bayesian Inference with Doubly-Intractable Likelihoods

arXiv.org Machine Learning

A large number of statistical models are "doubly-intractable": the likelihood normalising term, which is a function of the model parameters, is intractable, as well as the marginal likelihood (model evidence). This means that standard inference techniques to sample from the posterior, such as Markov chain Monte Carlo (MCMC), cannot be used. Examples include, but are not confined to, massive Gaussian Markov random fields, autologistic models and Exponential random graph models. A number of approximate schemes based on MCMC techniques, Approximate Bayesian computation (ABC) or analytic approximations to the posterior have been suggested, and these are reviewed here. Exact MCMC schemes, which can be applied to a subset of doubly-intractable distributions, have also been developed and are described in this paper. As yet, no general method exists which can be applied to all classes of models with doubly-intractable posteriors. In addition, taking inspiration from the Physics literature, we study an alternative method based on representing the intractable likelihood as an infinite series. Unbiased estimates of the likelihood can then be obtained by finite time stochastic truncation of the series via Russian Roulette sampling, although the estimates are not necessarily positive. Results from the Quantum Chromodynamics literature are exploited to allow the use of possibly negative estimates in a pseudo-marginal MCMC scheme such that expectations with respect to the posterior distribution are preserved. The methodology is reviewed on well-known examples such as the parameters in Ising models, the posterior for Fisher-Bingham distributions on the $d$-Sphere and a large-scale Gaussian Markov Random Field model describing the Ozone Column data. This leads to a critical assessment of the strengths and weaknesses of the methodology with pointers to ongoing research.


Partial Reinitialisation for Optimisers

arXiv.org Machine Learning

Heuristic optimisers which search for an optimal configuration of variables relative to an objective function often get stuck in local optima where the algorithm is unable to find further improvement. The standard approach to circumvent this problem involves periodically restarting the algorithm from random initial configurations when no further improvement can be found. We propose a method of partial reinitialization, whereby, in an attempt to find a better solution, only sub-sets of variables are re-initialised rather than the whole configuration. Much of the information gained from previous runs is hence retained. This leads to significant improvements in the quality of the solution found in a given time for a variety of optimisation problems in machine learning.


Approximate Message Passing with Restricted Boltzmann Machine Priors

arXiv.org Machine Learning

Approximate Message Passing (AMP) has been shown to be an excellent statistical approach to signal inference and compressed sensing problem. The AMP framework provides modularity in the choice of signal prior; here we propose a hierarchical form of the Gauss-Bernouilli prior which utilizes a Restricted Boltzmann Machine (RBM) trained on the signal support to push reconstruction performance beyond that of simple iid priors for signals whose support can be well represented by a trained binary RBM. We present and analyze two methods of RBM factorization and demonstrate how these affect signal reconstruction performance within our proposed algorithm. Finally, using the MNIST handwritten digit dataset, we show experimentally that using an RBM allows AMP to approach oracle-support performance.


Stochastic Collapsed Variational Inference for Sequential Data

arXiv.org Machine Learning

Stochastic variational inference for collapsed models has recently been successfully applied to large scale topic modelling. In this paper, we propose a stochastic collapsed variational inference algorithm in the sequential data setting. Our algorithm is applicable to both finite hidden Markov models and hierarchical Dirichlet process hidden Markov models, and to any datasets generated by emission distributions in the exponential family. Our experiment results on two discrete datasets show that our inference is both more efficient and more accurate than its uncollapsed version, stochastic variational inference.


Stochastic Collapsed Variational Inference for Hidden Markov Models

arXiv.org Machine Learning

Hidden Markov models (HMMs) [1] are popular probabilistic models for modelling sequential data in a variety of fields including natural language processing, speech recognition, weather forecasting, financial prediction and bioinformatics. However, their traditional inference methods such as vari-ational inference (VI) [2] and Markov chain Monte Carlo (MCMC) [3] are not readily scalable to large datasets. For example, one dataset in our experiment consists of 100 million observations. An important milestone for scaling VI was made by Hoffman et al. [4], who proposed stochastic VI (SVI) that computes cheap gradients based on minibatches of data, updating the model parameters before a complete pass of the full dataset. A recent scalable and more accurate algorithm was proposed by Foulds et al. [5], who applied such stochastic optimization to the collapsed latent Dirichlet allocation (LDA) [6], and their stochastic collapsed variational inference (SCVI) algorithm has been successful in large scale topic modelling.


Variational Particle Approximations

arXiv.org Machine Learning

Approximate inference in high-dimensional, discrete probabilistic models is a central problem in computational statistics and machine learning. This paper describes discrete particle variational inference (DPVI), a new approach that combines key strengths of Monte Carlo, variational and search-based techniques. DPVI is based on a novel family of particle-based variational approximations that can be fit using simple, fast, deterministic search techniques. Like Monte Carlo, DPVI can handle multiple modes, and yields exact results in a well-defined limit. Like unstructured mean-field, DPVI is based on optimizing a lower bound on the partition function; when this quantity is not of intrinsic interest, it facilitates convergence assessment and debugging. Like both Monte Carlo and combinatorial search, DPVI can take advantage of factorization, sequential structure, and custom search operators. This paper defines DPVI particle-based approximation family and partition function lower bounds, along with the sequential DPVI and local DPVI algorithm templates for optimizing them. DPVI is illustrated and evaluated via experiments on lattice Markov Random Fields, nonparametric Bayesian mixtures and block-models, and parametric as well as non-parametric hidden Markov models. Results include applications to real-world spike-sorting and relational modeling problems, and show that DPVI can offer appealing time/accuracy trade-offs as compared to multiple alternatives.


Reinforcement Learning with Parameterized Actions

arXiv.org Artificial Intelligence

We introduce a model-free algorithm for learning in Markov decision processes with parameterized actions--discrete actions with continuous parameters. At each step the agent must select both which action to use and which parameters to use with that action. We introduce the Q-PAMDP algorithm for learning in these domains, show that it converges to a local optimum, and compare it to direct policy search in the goalscoring and Platform domains.