Uncertainty
Naive Bayes in Machine Learning โ Towards Data Science
Bayes' theorem finds many uses in the probability theory and statistics. There's a micro chance that you have never heard about this theorem in your life. Turns out that this theorem has found its way into the world of machine learning, to form one of the highly decorated algorithms. In this article, we will learn all about the Naive Bayes Algorithm, along with its variations for different purposes in machine learning. As you might have guessed, this requires us to view things from a probabilistic point of view.
A Tutorial on Canonical Correlation Methods
Uurtio, Viivi, Monteiro, Joรฃo M., Kandola, Jaz, Shawe-Taylor, John, Fernandez-Reyes, Delmiro, Rousu, Juho
Canonical correlation analysis is a family of multivariate statistical methods for the analysis of paired sets of variables. Since its proposition, canonical correlation analysis has for instance been extended to extract relations between two sets of variables when the sample size is insufficient in relation to the data dimensionality, when the relations have been considered to be non-linear, and when the dimensionality is too large for human interpretation. This tutorial explains the theory of canonical correlation analysis including its regularised, kernel, and sparse variants. Additionally, the deep and Bayesian CCA extensions are briefly reviewed. Together with the numerical examples, this overview provides a coherent compendium on the applicability of the variants of canonical correlation analysis. By bringing together techniques for solving the optimisation problems, evaluating the statistical significance and generalisability of the canonical correlation model, and interpreting the relations, we hope that this article can serve as a hands-on tool for applying canonical correlation methods in data analysis.
Online Tool Condition Monitoring Based on Parsimonious Ensemble+
Pratama, Mahardhika, Dimla, Eric, Lughofer, Edwin, Pedrycz, Witold, Tjahjowidowo, Tegoeh
Accurate diagnosis of tool wear in metal turning process remains an open challenge for both scientists and industrial practitioners because of inhomogeneities in workpiece material, nonstationary machining settings to suit production requirements, and nonlinear relations between measured variables and tool wear. Common methodologies for tool condition monitoring still rely on batch approaches which cannot cope with a fast sampling rate of metal cutting process. Furthermore they require a retraining process to be completed from scratch when dealing with a new set of machining parameters. This paper presents an online tool condition monitoring approach based on Parsimonious Ensemble+, pENsemble+. The unique feature of pENsemble+ lies in its highly flexible principle where both ensemble structure and base-classifier structure can automatically grow and shrink on the fly based on the characteristics of data streams. Moreover, the online feature selection scenario is integrated to actively sample relevant input attributes. The paper presents advancement of a newly developed ensemble learning algorithm, pENsemble+, where online active learning scenario is incorporated to reduce operator labelling effort. The ensemble merging scenario is proposed which allows reduction of ensemble complexity while retaining its diversity. Experimental studies utilising real-world manufacturing data streams and comparisons with well known algorithms were carried out. Furthermore, the efficacy of pENsemble was examined using benchmark concept drift data streams. It has been found that pENsemble+ incurs low structural complexity and results in a significant reduction of operator labelling effort.
An efficient quantum algorithm for generative machine learning
Gao, Xun, Zhang, Zhengyu, Duan, Luming
Duan 1,2 1 Center for Quantum Information, IIIS, Tsinghua University, Beijing 100084, PR China 2 Department of Physics, University of Michigan, Ann Arbor, Michigan 48109, USA A central task in the field of quantum computing is to find applications where quantum computer could provide exponential speedup over any classical computer [1-3]. Machine learning represents an important field with broad applications where quantum computer may offer significant speedup [4-8]. Several quantum algorithms for discriminative machine learning [9] have been found based on efficient solving of linear algebraic problems [10-15], with potential exponential speedup in runtime under the assumption of effective input from a quantum random access memory [16]. In machine learning, generative models represent another large class [9] which is widely used for both supervised and unsupervised learning [17, 18]. Here, we propose an efficient quantum algorithm for machine learning based on a quantum generative model. We prove that our proposed model is exponentially more powerful to represent probability distributions compared with classical generative models and has exponential speedup in training and inference at least for some instances under a reasonable assumption in computational complexity theory. Our result opens a new direction for quantum machine learning and offers a remarkable example in which a quantum algorithm shows exponential improvement over any classical algorithm in an important application field. Machine learning and artificial intelligence represent a very important application area which could be revolutionized by quantum computers with clever algorithms that offer exponential speedup [4, 5]. The candidate algorithms with potential exponential speedup so far rely on efficient quantum solution of linear system of equations or linear algebraic problems [12-15]. Those algorithms require quantum random access memory (QRAM) as a critical component in addition to a quantum computer. In a QRAM, the number of required quantum routers scales up exponentially with the number of qubits in those algorithms [16, 19]. This exponential overhead in resource requirement poses a significant challenge for its experimental implementation and is a caveat for fair comparison with corresponding classical algorithms [5, 20]. In this paper, we propose a quantum algorithm with potential exponential speedup for machine learning basedFigure 1: Classical and quantum generative models. A factor graph is a bipartite graph where one group of the vertices represent variables (denoted by circles) and the other group of vertices represent positive functions (denoted by squares) acting on connected variables. The corresponding probability distribution is given by the product of all these functions. Each variable connects to at most a constant number of functions which introduce correlations in the probability distribution.b,
Flexible statistical inference for mechanistic models of neural dynamics
Lueckmann, Jan-Matthis, Goncalves, Pedro J., Bassetto, Giacomo, รcal, Kaan, Nonnenmacher, Marcel, Macke, Jakob H.
Mechanistic models of single-neuron dynamics have been extensively studied in computational neuroscience. However, identifying which models can quantitatively reproduce empirically measured data has been challenging. We propose to overcome this limitation by using likelihood-free inference approaches (also known as Approximate Bayesian Computation, ABC) to perform full Bayesian inference on single-neuron models. Our approach builds on recent advances in ABC by learning a neural network which maps features of the observed data to the posterior distribution over parameters. We learn a Bayesian mixture-density network approximating the posterior over multiple rounds of adaptively chosen simulations. Furthermore, we propose an efficient approach for handling missing features and parameter settings for which the simulator fails, as well as a strategy for automatically learning relevant features using recurrent neural networks. On synthetic data, our approach efficiently estimates posterior distributions and recovers ground-truth parameters. On in-vitro recordings of membrane voltages, we recover multivariate posteriors over biophysical parameters, which yield model-predicted voltage traces that accurately match empirical data. Our approach will enable neuroscientists to perform Bayesian inference on complex neuron models without having to design model-specific algorithms, closing the gap between mechanistic and statistical approaches to single-neuron modelling.
Simultaneous Block-Sparse Signal Recovery Using Pattern-Coupled Sparse Bayesian Learning
Xiao, Hang, Xing, Zhengli, Yang, Linxiao, Fang, Jun, Wu, Yanlun
In this paper, we consider the block-sparse signals recovery problem in the context of multiple measurement vectors (MMV) with common row sparsity patterns. We develop a new method for recovery of common row sparsity MMV signals, where a pattern-coupled hierarchical Gaussian prior model is introduced to characterize both the block-sparsity of the coefficients and the statistical dependency between neighboring coefficients of the common row sparsity MMV signals. Unlike many other methods, the proposed method is able to automatically capture the block sparse structure of the unknown signal. Our method is developed using an expectation-maximization (EM) framework. Simulation results show that our proposed method offers competitive performance in recovering block-sparse common row sparsity pattern MMV signals.
Beyond normality: Learning sparse probabilistic graphical models in the non-Gaussian setting
Morrison, Rebecca E., Baptista, Ricardo, Marzouk, Youssef
We present an algorithm to identify sparse dependence structure in continuous and non-Gaussian probability distributions, given a corresponding set of data. The conditional independence structure of an arbitrary distribution can be represented as an undirected graph (or Markov random field), but most algorithms for learning this structure are restricted to the discrete or Gaussian cases. Our new approach allows for more realistic and accurate descriptions of the distribution in question, and in turn better estimates of its sparse Markov structure. Sparsity in the graph is of interest as it can accelerate inference, improve sampling methods, and reveal important dependencies between variables. The algorithm relies on exploiting the connection between the sparsity of the graph and the sparsity of transport maps, which deterministically couple one probability measure to another.
Sum-Product Networks for Hybrid Domains
Molina, Alejandro, Vergari, Antonio, Di Mauro, Nicola, Natarajan, Sriraam, Esposito, Floriana, Kersting, Kristian
While all kinds of mixed data -from personal data, over panel and scientific data, to public and commercial data- are collected and stored, building probabilistic graphical models for these hybrid domains becomes more difficult. Users spend significant amounts of time in identifying the parametric form of the random variables (Gaussian, Poisson, Logit, etc.) involved and learning the mixed models. To make this difficult task easier, we propose the first trainable probabilistic deep architecture for hybrid domains that features tractable queries. It is based on Sum-Product Networks (SPNs) with piecewise polynomial leave distributions together with novel nonparametric decomposition and conditioning steps using the Hirschfeld-Gebelein-R\'enyi Maximum Correlation Coefficient. This relieves the user from deciding a-priori the parametric form of the random variables but is still expressive enough to effectively approximate any continuous distribution and permits efficient learning and inference. Our empirical evidence shows that the architecture, called Mixed SPNs, can indeed capture complex distributions across a wide range of hybrid domains.
Trimmed Density Ratio Estimation
Liu, Song, Takeda, Akiko, Suzuki, Taiji, Fukumizu, Kenji
Density ratio estimation (DRE) [18, 11, 27] is an important tool in various branches of machine learning and statistics. Due to its ability of directly modelling the differences between two probability density functions, DRE finds its applications in change detection [13, 6], twosample test [32] and outlier detection [1, 26]. In recent years, a sampling framework called Generative Adversarial Network (GAN) (see e.g., [9, 19]) uses the density ratio function to compare artificial samples from a generative distribution and real samples from an unknown distribution. DRE has also been widely discussed in statistical literatures for adjusting nonparametric density estimation [5], stabilizing the estimation of heavy tailed distribution [7] and fitting multiple distributions at once [8]. However, as a density ratio function can grow unbounded, DRE can suffer from robustness and stability issues: a few corrupted points may completely mislead the estimator (see Figure 2 in Section 6 for example).
On Bayesian index policies for sequential resource allocation
This paper is about index policies for minimizing (frequentist) regret in a stochastic multi-armed bandit model, inspired by a Bayesian view on the problem. Our main contribution is to prove that the Bayes-UCB algorithm, which relies on quantiles of posterior distributions, is asymptotically optimal when the reward distributions belong to a one-dimensional exponential family, for a large class of prior distributions. We also show that the Bayesian literature gives new insight on what kind of exploration rates could be used in frequentist, UCB-type algorithms. Indeed, approximations of the Bayesian optimal solution or the Finite Horizon Gittins indices provide a justification for the kl-UCB+ and kl-UCB-H+ algorithms, whose asymptotic optimality is also established.