Goto

Collaborating Authors

 Bayesian Learning


Related Datasets in Oracle DV Machine Learning models

#artificialintelligence

Depending on the algorithm/model that generates this dataset metrics present in the dataset will vary. Here is a list of metrics based on the model: Linear Regression, CART numeric, Elastic Net Linear: R-Square, R-Square Adjusted, Mean Absolute Error(MAE), Mean Squared Error(MSE), Relative Absolute Error(RAE), Related Squared Error(RSE), Root Mean Squared Error(RMSE) CART(Classification And Regression Trees), Naive Bayes Classification, Neural Network, Support Vector Machine(SVM), Random Forest, Logistic Regression: Now you know what the Related datasets are and how they can be useful for fine tuning your Machine Learning model or for comparing two different models. .


Fast Meta-Learning for Adaptive Hierarchical Classifier Design

arXiv.org Machine Learning

The Bayes error rate (BER) is a central concept in the statistical theory of classification. It represents the error rate of the Bayes classifier, which assigns a label to an object corresponding to the class with the highest posterior probability. By definition, the Bayes error represents the smallest possible average error rate that can be achieved by any decision rule (Wald, 1947). Because of these properties, the BER is of great interest both for benchmarking classification algorithms as well as for the practical design of classification algorithms. For example, an accurate approximation of the BER can be used for classifier parameter selection, data dimensionality reduction, or variable selection. However, accurate BER approximation is difficult, especially in high dimension, and thus much attention has focused on tight and tractable BER bounds. This paper proposes a model-free approach to designing multiclass classifiers using a bias-corrected BER bound estimated directly from the multiclass data. There exists several useful bounds on the BER that are functions of the class-dependent feature distributions. These include information theoretic divergence measures such as the Chernoffα -divergence (Chernoff, 1952), the Bhattacharyya divergence (Kailath, 1967), or the Jensen-Shannon divergence (Lin, 1991).


A Separation Principle for Control in the Age of Deep Learning

arXiv.org Machine Learning

We review the problem of defining and inferring a "state" for a control system based on complex, high-dimensional, highly uncertain measurement streams such as videos. Such a state, or representation, should contain all and only the information needed for control, and discount nuisance variability in the data. It should also have finite complexity, ideally modulated depending on available resources. This representation is what we want to store in memory in lieu of the data, as it "separates" the control task from the measurement process. For the trivial case with no dynamics, a representation can be inferred by minimizing the Information Bottleneck Lagrangian in a function class realized by deep neural networks. The resulting representation has much higher dimension than the data, already in the millions, but it is smaller in the sense of information content, retaining only what is needed for the task. This process also yields representations that are invariant to nuisance factors and having maximally independent components. We extend these ideas to the dynamic case, where the representation is the posterior density of the task variable given the measurements up to the current time, which is in general much simpler than the prediction density maintained by the classical Bayesian filter. Again this can be finitely-parametrized using a deep neural network, and already some applications are beginning to emerge. No explicit assumption of Markovianity is needed; instead, complexity trades off approximation of an optimal representation, including the degree of Markovianity.


pg-Causality: Identifying Spatiotemporal Causal Pathways for Air Pollutants with Urban Big Data

arXiv.org Artificial Intelligence

Many countries are suffering from severe air pollution. Understanding how different air pollutants accumulate and propagate is critical to making relevant public policies. In this paper, we use urban big data (air quality data and meteorological data) to identify the \emph{spatiotemporal (ST) causal pathways} for air pollutants. This problem is challenging because: (1) there are numerous noisy and low-pollution periods in the raw air quality data, which may lead to unreliable causality analysis, (2) for large-scale data in the ST space, the computational complexity of constructing a causal structure is very high, and (3) the \emph{ST causal pathways} are complex due to the interactions of multiple pollutants and the influence of environmental factors. Therefore, we present \emph{p-Causality}, a novel pattern-aided causality analysis approach that combines the strengths of \emph{pattern mining} and \emph{Bayesian learning} to efficiently and faithfully identify the \emph{ST causal pathways}. First, \emph{Pattern mining} helps suppress the noise by capturing frequent evolving patterns (FEPs) of each monitoring sensor, and greatly reduce the complexity by selecting the pattern-matched sensors as "causers". Then, \emph{Bayesian learning} carefully encodes the local and ST causal relations with a Gaussian Bayesian network (GBN)-based graphical model, which also integrates environmental influences to minimize biases in the final results. We evaluate our approach with three real-world data sets containing 982 air quality sensors, in three regions of China from 01-Jun-2013 to 19-Dec-2015. Results show that our approach outperforms the traditional causal structure learning methods in time efficiency, inference accuracy and interpretability.


Variational Fourier features for Gaussian processes

arXiv.org Machine Learning

This work brings together two powerful concepts in Gaussian processes: the variational approach to sparse approximation and the spectral representation of Gaussian processes. This gives rise to an approximation that inherits the benefits of the variational approach but with the representational power and computational scalability of spectral representations. The work hinges on a key result that there exist spectral features related to a finite domain of the Gaussian process which exhibit almost-independent covariances. We derive these expressions for Matern kernels in one dimension, and generalize to more dimensions using kernels with specific structures. Under the assumption of additive Gaussian noise, our method requires only a single pass through the dataset, making for very fast and accurate computation. We fit a model to 4 million training points in just a few minutes on a standard laptop. With non-conjugate likelihoods, our MCMC scheme reduces the cost of computation from O(NM2) (for a sparse Gaussian process) to O(NM) per iteration, where N is the number of data and M is the number of features.


Naive Bayes in Machine Learning – Towards Data Science

@machinelearnbot

Bayes' theorem finds many uses in the probability theory and statistics. There's a micro chance that you have never heard about this theorem in your life. Turns out that this theorem has found its way into the world of machine learning, to form one of the highly decorated algorithms. In this article, we will learn all about the Naive Bayes Algorithm, along with its variations for different purposes in machine learning. As you might have guessed, this requires us to view things from a probabilistic point of view.


UCB Exploration via Q-Ensembles

arXiv.org Machine Learning

We show how an ensemble of $Q^*$-functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the $Q$-learning setting. We propose an exploration strategy based on upper-confidence bounds (UCB). Our experiments show significant gains on the Atari benchmark.


A Tutorial on Canonical Correlation Methods

arXiv.org Machine Learning

Canonical correlation analysis is a family of multivariate statistical methods for the analysis of paired sets of variables. Since its proposition, canonical correlation analysis has for instance been extended to extract relations between two sets of variables when the sample size is insufficient in relation to the data dimensionality, when the relations have been considered to be non-linear, and when the dimensionality is too large for human interpretation. This tutorial explains the theory of canonical correlation analysis including its regularised, kernel, and sparse variants. Additionally, the deep and Bayesian CCA extensions are briefly reviewed. Together with the numerical examples, this overview provides a coherent compendium on the applicability of the variants of canonical correlation analysis. By bringing together techniques for solving the optimisation problems, evaluating the statistical significance and generalisability of the canonical correlation model, and interpreting the relations, we hope that this article can serve as a hands-on tool for applying canonical correlation methods in data analysis.


An efficient quantum algorithm for generative machine learning

arXiv.org Machine Learning

Duan 1,2 1 Center for Quantum Information, IIIS, Tsinghua University, Beijing 100084, PR China 2 Department of Physics, University of Michigan, Ann Arbor, Michigan 48109, USA A central task in the field of quantum computing is to find applications where quantum computer could provide exponential speedup over any classical computer [1-3]. Machine learning represents an important field with broad applications where quantum computer may offer significant speedup [4-8]. Several quantum algorithms for discriminative machine learning [9] have been found based on efficient solving of linear algebraic problems [10-15], with potential exponential speedup in runtime under the assumption of effective input from a quantum random access memory [16]. In machine learning, generative models represent another large class [9] which is widely used for both supervised and unsupervised learning [17, 18]. Here, we propose an efficient quantum algorithm for machine learning based on a quantum generative model. We prove that our proposed model is exponentially more powerful to represent probability distributions compared with classical generative models and has exponential speedup in training and inference at least for some instances under a reasonable assumption in computational complexity theory. Our result opens a new direction for quantum machine learning and offers a remarkable example in which a quantum algorithm shows exponential improvement over any classical algorithm in an important application field. Machine learning and artificial intelligence represent a very important application area which could be revolutionized by quantum computers with clever algorithms that offer exponential speedup [4, 5]. The candidate algorithms with potential exponential speedup so far rely on efficient quantum solution of linear system of equations or linear algebraic problems [12-15]. Those algorithms require quantum random access memory (QRAM) as a critical component in addition to a quantum computer. In a QRAM, the number of required quantum routers scales up exponentially with the number of qubits in those algorithms [16, 19]. This exponential overhead in resource requirement poses a significant challenge for its experimental implementation and is a caveat for fair comparison with corresponding classical algorithms [5, 20]. In this paper, we propose a quantum algorithm with potential exponential speedup for machine learning basedFigure 1: Classical and quantum generative models. A factor graph is a bipartite graph where one group of the vertices represent variables (denoted by circles) and the other group of vertices represent positive functions (denoted by squares) acting on connected variables. The corresponding probability distribution is given by the product of all these functions. Each variable connects to at most a constant number of functions which introduce correlations in the probability distribution.b,


Flexible statistical inference for mechanistic models of neural dynamics

arXiv.org Machine Learning

Mechanistic models of single-neuron dynamics have been extensively studied in computational neuroscience. However, identifying which models can quantitatively reproduce empirically measured data has been challenging. We propose to overcome this limitation by using likelihood-free inference approaches (also known as Approximate Bayesian Computation, ABC) to perform full Bayesian inference on single-neuron models. Our approach builds on recent advances in ABC by learning a neural network which maps features of the observed data to the posterior distribution over parameters. We learn a Bayesian mixture-density network approximating the posterior over multiple rounds of adaptively chosen simulations. Furthermore, we propose an efficient approach for handling missing features and parameter settings for which the simulator fails, as well as a strategy for automatically learning relevant features using recurrent neural networks. On synthetic data, our approach efficiently estimates posterior distributions and recovers ground-truth parameters. On in-vitro recordings of membrane voltages, we recover multivariate posteriors over biophysical parameters, which yield model-predicted voltage traces that accurately match empirical data. Our approach will enable neuroscientists to perform Bayesian inference on complex neuron models without having to design model-specific algorithms, closing the gap between mechanistic and statistical approaches to single-neuron modelling.