Goto

Collaborating Authors

 Learning Graphical Models


The Top 10 Topics in Machine Learning Revisited: A Quantitative Meta-Study

arXiv.org Machine Learning

Which topics of machine learning are most commonly addressed in research? This question was initially answered in 2007 by doing a qualitative survey among distinguished researchers. In our study, we revisit this question from a quantitative perspective. Concretely, we collect 54K abstracts of papers published between 2007 and 2016 in leading machine learning journals and conferences. We then use machine learning in order to determine the top 10 topics in machine learning. We not only include models, but provide a holistic view across optimization, data, features, etc. This quantitative approach allows reducing the bias of surveys. It reveals new and up-to-date insights into what the 10 most prolific topics in machine learning research are. This allows researchers to identify popular topics as well as new and rising topics for their research.


Marginal likelihood based model comparison in Fuzzy Bayesian Learning

arXiv.org Machine Learning

RADITIONAL rule based fuzzy systems encode expert opinion in the form of IF-THEN rules and optimise some performance metric (typically mean squared error in predicting a data-set) to obtain the hyper-parameters of the model (like the numeric values defining the shape of the membership functions etc.) [2]-[4]. The rule base is one of the core elements driving the model and slight change in the rule base would significantly affect the performance of the fuzzy inference system. Many automated methods have been proposed where the rule base structure and parameters can be automatically determined [5]-[7]. However interpretability of such models is an issue and various methods have been proposed to alleviate the issue [8]. In the present paper however, we are interested in the actual metric for comparison between different models having different rule bases derived from expert opinion. The comparison metric can nevertheless be embedded within an automated framework to evolve the best model if so required. The most straight forward way of comparing the fuzzy rule bases is to optimise the model parameters based on the prediction error (e.g.


Multi-fidelity Gaussian Process Bandit Optimisation

arXiv.org Artificial Intelligence

In many scientific and engineering applications, we are tasked with the optimisation of an expensive to evaluate black box function $f$. Traditional settings for this problem assume just the availability of this single function. However, in many cases, cheap approximations to $f$ may be obtainable. For example, the expensive real world behaviour of a robot can be approximated by a cheap computer simulation. We can use these approximations to eliminate low function value regions cheaply and use the expensive evaluations of $f$ in a small but promising region and speedily identify the optimum. We formalise this task as a \emph{multi-fidelity} bandit problem where the target function and its approximations are sampled from a Gaussian process. We develop MF-GP-UCB, a novel method based on upper confidence bound techniques. In our theoretical analysis we demonstrate that it exhibits precisely the above behaviour, and achieves better regret than strategies which ignore multi-fidelity information. Empirically, MF-GP-UCB outperforms such naive strategies and other multi-fidelity methods on several synthetic and real experiments.


A flexible state space model for learning nonlinear dynamical systems

arXiv.org Machine Learning

We consider a nonlinear state-space model with the state transition and observation functions expressed as basis function expansions. The coefficients in the basis function expansions are learned from data. Using a connection to Gaussian processes we also develop priors on the coefficients, for tuning the model flexibility and to prevent overfitting to data, akin to a Gaussian process state-space model. The priors can alternatively be seen as a regularization, and helps the model in generalizing the data without sacrificing the richness offered by the basis function expansion. To learn the coefficients and other unknown parameters efficiently, we tailor an algorithm using state-of-the-art sequential Monte Carlo methods, which comes with theoretical guarantees on the learning. Our approach indicates promising results when evaluated on a classical benchmark as well as real data.


Particle Filtering for PLCA model with Application to Music Transcription

arXiv.org Machine Learning

Automatic Music Transcription (AMT) consists in automatically estimating the notes in an audio recording, through three attributes: onset time, duration and pitch. Probabilistic Latent Component Analysis (PLCA) has become very popular for this task. PLCA is a spectrogram factorization method, able to model a magnitude spectrogram as a linear combination of spectral vectors from a dictionary. Such methods use the Expectation-Maximization (EM) algorithm to estimate the parameters of the acoustic model. This algorithm presents well-known inherent defaults (local convergence, initialization dependency), making EM-based systems limited in their applications to AMT, particularly in regards to the mathematical form and number of priors. To overcome such limits, we propose in this paper to employ a different estimation framework based on Particle Filtering (PF), which consists in sampling the posterior distribution over larger parameter ranges. This framework proves to be more robust in parameter estimation, more flexible and unifying in the integration of prior knowledge in the system. Note-level transcription accuracies of 61.8 $\%$ and 59.5 $\%$ were achieved on evaluation sound datasets of two different instrument repertoires, including the classical piano (from MAPS dataset) and the marovany zither, and direct comparisons to previous PLCA-based approaches are provided. Steps for further development are also outlined.


Unifying the Stochastic Spectral Descent for Restricted Boltzmann Machines with Bernoulli or Gaussian Inputs

arXiv.org Machine Learning

Stochastic gradient descent based algorithms are typically used as the general optimization tools for most deep learning models. A Restricted Boltzmann Machine (RBM) is a probabilistic generative model that can be stacked to construct deep architectures. For RBM with Bernoulli inputs, non-Euclidean algorithm such as stochastic spectral descent (SSD) has been specifically designed to speed up the convergence with improved use of the gradient estimation by sampling methods. However, the existing algorithm and corresponding theoretical justification depend on the assumption that the possible configurations of inputs are finite, like binary variables. The purpose of this paper is to generalize SSD for Gaussian RBM being capable of mod- eling continuous data, regardless of the previous assumption. We propose the gradient descent methods in non-Euclidean space of parameters, via de- riving the upper bounds of logarithmic partition function for RBMs based on Schatten-infinity norm. We empirically show that the advantage and improvement of SSD over stochastic gradient descent (SGD).


Solving Non-parametric Inverse Problem in Continuous Markov Random Field using Loopy Belief Propagation

arXiv.org Machine Learning

In this paper, we address the inverse problem, or the statistical machine learning problem, in Markov random fields with a non-parametric pair-wise energy function with continuous variables. The inverse problem is formulated by maximum likelihood estimation. The exact treatment of maximum likelihood estimation is intractable because of two problems: (1) it includes the evaluation of the partition function and (2) it is formulated in the form of functional optimization. We avoid Problem (1) by using Bethe approximation. Bethe approximation is an approximation technique equivalent to the loopy belief propagation. Problem (2) can be solved by using orthonormal function expansion. Orthonormal function expansion can reduce a functional optimization problem to a function optimization problem. Our method can provide an analytic form of the solution of the inverse problem within the framework of Bethe approximation.


Thompson Sampling for Linear-Quadratic Control Problems

arXiv.org Machine Learning

We consider the exploration-exploitation tradeoff in linear quadratic (LQ) control problems, where the state dynamics is linear and the cost function is quadratic in states and controls. We analyze the regret of Thompson sampling (TS) (a.k.a. posterior-sampling for reinforcement learning) in the frequentist setting, i.e., when the parameters characterizing the LQ dynamics are fixed. Despite the empirical and theoretical success in a wide range of problems from multi-armed bandit to linear bandit, we show that when studying the frequentist regret TS in control problems, we need to trade-off the frequency of sampling optimistic parameters and the frequency of switches in the control policy. This results in an overall regret of $O(T^{2/3})$, which is significantly worse than the regret $O(\sqrt{T})$ achieved by the optimism-in-face-of-uncertainty algorithm in LQ control problems.


Sparse Multi-Output Gaussian Processes for Medical Time Series Prediction

arXiv.org Machine Learning

In real-time monitoring of hospital patients, high-quality inference of patients' health status using all information available from clinical covariates and lab tests are essential to enable successful medical interventions and improve patient outcomes. In this work, we develop and explore a Bayesian nonparametric model based on Gaussian process (GP) regression for hospital patient monitoring. Our method, MedGP, incorporates 24 clinical and lab covariates and supports a rich reference data set from which the relationships between these observed covariates may be inferred and exploited for high-quality inference of patient state over time. To do this, we develop a highly structured sparse GP kernel to enable tractable computation over tens of thousands of time points while estimating correlations among clinical covariates, patients, and periodicity in high-dimensional time series measurements of physiological signals. We apply MedGP to data from hundreds of thousands of patients treated at the Hospital of the University of Pennsylvania. MedGP has a number of benefits over current methods, including (i) not requiring an alignment of the time series data, (ii) quantifying confidence intervals in the predictions, (iii) exploiting a vast and rich database of patients, and (iv) providing interpretable relationships among clinical covariates. We evaluate and compare results from MedGP on the task of online state prediction for three different patient subgroups. Keywords: Gaussian processes, electronic health records, sparse time series analysis, spectral mixture kernel, kernel density estimation.


Linear Thompson Sampling Revisited

arXiv.org Machine Learning

We derive an alternative proof for the regret of Thompson sampling (\ts) in the stochastic linear bandit setting. While we obtain a regret bound of order $\widetilde{O}(d^{3/2}\sqrt{T})$ as in previous results, the proof sheds new light on the functioning of the \ts. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and how selecting optimal arms associated to \textit{optimistic} parameters does control it. Thus we show that \ts can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional $\sqrt{d}$ regret factor compared to a UCB-like approach. Furthermore, we show that our proof can be readily applied to regularized linear optimization and generalized linear model problems.