Goto

Collaborating Authors

 Uncertainty


Approximate Inference with Amortised MCMC

arXiv.org Machine Learning

We propose a novel approximate inference framework that approximates a target distribution by amortising the dynamics of a user-selected Markov chain Monte Carlo (MCMC) sampler. The idea is to initialise MCMC using samples from an approximation network, apply the MCMC operator to improve these samples, and finally use the samples to update the approximation network thereby improving its quality. This provides a new generic framework for approximate inference, allowing us to deploy highly complex, or implicitly defined approximation families with intractable densities, including approximations produced by warping a source of randomness through a deep neural network. Experiments consider Bayesian neural network classification and image modelling with deep generative models. Deep models trained using amortised MCMC are shown to generate realistic looking samples as well as producing diverse imputations for images with regions of missing pixels.


10 Free Must-Read Books for Machine Learning and Data Science

@machinelearnbot

This book provides an introduction to statistical learning methods. It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.


Bayesian Machine Learning, Explained

@machinelearnbot

So you know the Bayes rule. How does it relate to machine learning? It can be quite difficult to grasp how the puzzle pieces fit together - we know it took us a while. This article is an introduction we wish we had back then. While we have some grasp on the matter, we're not experts, so the following might contain inaccuracies or even outright errors. Feel free to point them out, either in the comments or privately.



The Kernel Mixture Network: A Nonparametric Method for Conditional Density Estimation of Continuous Random Variables

arXiv.org Machine Learning

This paper introduces the kernel mixture network, a new method for nonparametric estimation of conditional probability densities using neural networks. We model arbitrarily complex conditional densities as linear combinations of a family of kernel functions centered at a subset of training points. The weights are determined by the outer layer of a deep neural network, trained by minimizing the negative log likelihood. This generalizes the popular quantized softmax approach, which can be seen as a kernel mixture network with square and non-overlapping kernels. We test the performance of our method on two important applications, namely Bayesian filtering and generative modeling. In the Bayesian filtering example, we show that the method can be used to filter complex nonlinear and non-Gaussian signals defined on manifolds. The resulting kernel mixture network filter outperforms both the quantized softmax filter and the extended Kalman filter in terms of model likelihood. Finally, our experiments on generative models show that, given the same architecture, the kernel mixture network leads to higher test set likelihood, less overfitting and more diversified and realistic generated samples than the quantized softmax approach.


Estimating Accuracy from Unlabeled Data: A Probabilistic Logic Approach

arXiv.org Machine Learning

We propose an efficient method to estimate the accuracy of classifiers using only unlabeled data. We consider a setting with multiple classification problems where the target classes may be tied together through logical constraints. For example, a set of classes may be mutually exclusive, meaning that a data instance can belong to at most one of them. The proposed method is based on the intuition that: (i) when classifiers agree, they are more likely to be correct, and (ii) when the classifiers make a prediction that violates the constraints, at least one classifier must be making an error. Experiments on four real-world data sets produce accuracy estimates within a few percent of the true accuracy, using solely unlabeled data. Our models also outperform existing state-of-the-art solutions in both estimating accuracies, and combining multiple classifier outputs. The results emphasize the utility of logical constraints in estimating accuracy, thus validating our intuition.


CDS Rate Construction Methods by Machine Learning Techniques

arXiv.org Machine Learning

Regulators require financial institutions to estimate counterparty default risks from liquid CDS quotes for the valuation and risk management of OTC derivatives. However, the vast majority of counterparties do not have liquid CDS quotes and need proxy CDS rates. Existing methods cannot account for counterparty-specific default risks; we propose to construct proxy CDS rates by associating to illiquid counterparty liquid CDS Proxy based on Machine Learning Techniques. After testing 156 classifiers from 8 most popular classifier families, we found that some classifiers achieve highly satisfactory accuracy rates. Furthermore, we have rank-ordered the performances and investigated performance variations amongst and within the 8 classifier families. This paper is, to the best of our knowledge, the first systematic study of CDS Proxy construction by Machine Learning techniques, and the first systematic classifier comparison study based entirely on financial market data. Its findings both confirm and contrast existing classifier performance literature. Given the typically highly correlated nature of financial data, we investigated the impact of correlation on classifier performance. The techniques used in this paper should be of interest for financial institutions seeking a CDS Proxy method, and can serve for proxy construction for other financial variables. Some directions for future research are indicated.


Applying Bayes Theorem: Simulating the Monty Hall Problem with Python

#artificialintelligence

The Monty Hall problem was first featured on the classic game show "Let's make a Deal". In the final segment of the show, contestants were presented with a choice of three different doors. Behind two of the doors would be a goat, and behind the third would be an extravagant prize such as a car. The contestant begins the game by picking one door. The host, Monty Hall, would then open one of the remaining doors.


Evolving Ensemble Fuzzy Classifier

arXiv.org Artificial Intelligence

Abstract-- The concept of ensemble learning offers a promising avenue in learning from data streams under complex environments because it addresses the bias and variance dilemma better than its single-model counterpart and features a reconfigurable structure, which is well-suited to the given context. While various extensions of ensemble learning for mining nonstationary data streams can be found in the literature, most of them are crafted under a static base-classifier and revisits preceding samples in the sliding window for a retraining step. This feature causes computationally prohibitive complexity and is not flexible enough to cope with rapidly changing environments. Their complexities are often demanding because it involves a large collection of offline classifiers due to the absence of structural complexities reduction mechanisms and lack of an online feature selection mechanism. A novel evolving ensemble classifier, namely Parsimonious Ensemble (pENsemble), is proposed in this paper. A dynamic online feature selection scenario is integrated into the pENsemble. This method allows for dynamic selection and deselection of input features on the fly. The efficacy of the pENsemble has been numerically demonstrated through rigorous numerical studies with dynamic and evolving data streams where it delivers the most encouraging performance in attaining a tradeoff between accuracy and complexity. I. INTRODUCTION The data-intensive era where data are collected continuously in a fast rate under dynamic and evolving environments opens a new research direction to process data streams efficiently [1], [2]. Unlike a classical paradigm in machine learning where a dataset is utilised to construct hypothesis and is executed over multiple passes, data streams requires a strictly online learning framework with a low memory requirement and even if possible with no memory at all - one-pass learning mode. Another challenging trait of data streams lies in the non-stationary characteristics [3] where the data does not follow static and predictable distributions and contains a variety of concept drifts [4], [5]. These facts make a retraining phase when incorporating a new sample to an old dataset impossible to be performed because it leads to the socalled catastrophic forgetting [6] of previously valid knowledge and is not scalable when dealing with massive data streams. Evolving Intelligent System (EIS) provides a unique solution for data stream mining because a strictly one-pass learning procedure involved here has delivered great success to cope with time-critical applications where data streams are generated at a very fast sampling rate [7]. Furthermore, EIS adopts an open structure where its components can be automatically generated, pruned, merged and recalled on the fly [8], [9] and can be well-suited to a given problem.


A fuzzy expert system for earthquake prediction, case study: the Zagros range

arXiv.org Artificial Intelligence

A methodology for the development of a fuzzy expert system (FES) with application to earthquake prediction is presented. The idea is to reproduce the performance of a human expert in earthquake prediction. To do this, at the first step, rules provided by the human expert are used to generate a fuzzy rule base. These rules are then fed into an inference engine to produce a fuzzy inference system (FIS) and to infer the results. In this paper, we have used a Sugeno type fuzzy inference system to build the FES. At the next step, the adaptive network-based fuzzy inference system (ANFIS) is used to refine the FES parameters and improve its performance. The proposed framework is then employed to attain the performance of a human expert used to predict earthquakes in the Zagros area based on the idea of coupled earthquakes. While the prediction results are promising in parts of the testing set, the general performance indicates that prediction methodology based on coupled earthquakes needs more investigation and more complicated reasoning procedure to yield satisfactory predictions.