Goto

Collaborating Authors

 Directed Networks


Diagnosing model misspecification and performing generalized Bayes' updates via probabilistic classifiers

arXiv.org Machine Learning

Model misspecification is a long-standing enigma of the Bayesian inference framework as posteriors tend to get overly concentrated on ill-informed parameter values towards the large sample limit. Tempering of the likelihood has been established as a safer way to do updates from prior to posterior in the presence of model misspecification. At one extreme tempering can ignore the data altogether and at the other extreme it provides the standard Bayes' update when no misspecification is assumed to be present. However, it is an open issue how to best recognize misspecification and choose a suitable level of tempering without access to the true generating model. Here we show how probabilistic classifiers can be employed to resolve this issue. By training a probabilistic classifier to discriminate between simulated and observed data provides an estimate of the ratio between the model likelihood and the likelihood of the data under the unobserved true generative process, within the discriminatory abilities of the classifier. The expectation of the logarithm of a ratio with respect to the data generating process gives an estimation of the negative Kullback-Leibler divergence between the statistical generative model and the true generative distribution. Using a set of canonical examples we show that this divergence provides a useful misspecification diagnostic, a model comparison tool, and a method to inform a generalised Bayesian update in the presence of misspecification for likelihood-based models.


Towards Expressive Priors for Bayesian Neural Networks: Poisson Process Radial Basis Function Networks

arXiv.org Machine Learning

While Bayesian neural networks have many appealing characteristics, current priors do not easily allow users to specify basic properties such as expected lengthscale or amplitude variance. In this work, we introduce Poisson Process Radial Basis Function Networks, a novel prior that is able to encode amplitude stationarity and input-dependent lengthscale. We prove that our novel formulation allows for a decoupled specification of these properties, and that the estimated regression function is consistent as the number of observations tends to infinity. We demonstrate its behavior on synthetic and real examples.


A Beginner's Guide to Machine Learning: What Aspiring Data Scientists Should Know - DZone AI

#artificialintelligence

Before choosing a machine learning algorithm, it's important to know their characteristics to generate desired outputs and build smart systems. Data science is growing super fast. As the demand for AI-enabled solutions is increasing, delivering smarter systems for industries has become essential. And the correctness and efficiency through machine learning operations must be fulfilled to ensure the developed solutions complete all demands. Hence, applying machine learning algorithms on the given dataset to produce righteous results and train the intelligent system is one of the most essential steps from the entire process.


The Use of Machine Learning and Big Five Personality Taxonomy to Predict Construction Workers' Safety Behaviour

arXiv.org Machine Learning

Research has found that many occupational accidents are foreseeable, being the result of people's unsafe behaviour from a retrospective point of view. The prediction of workers' safety behaviour will enable the prior insights into each worker's behavioural tendency and will be useful in the design of management practices prior to the occurrence of accidents and contribute to the reduction of injury rates. In recent years, researchers have found that people do have stable predispositions to engage in certain safety behavioural patterns which vary among individuals as a function of personality features. In this study, an innovative forecasting model, which employs machine learning algorithms, is developed to estimate construction workers' behavioural tendency based on the Big Five personality taxonomy. The data-driven nature of machine learning technique enabled a reliable estimate of the personality-safety behaviour relationship, which allowed this study to provide novel insight that nonlinearity may exist in the relationship between construction workers' personality traits and safety behaviour. The developed model is found to be sufficient to have satisfactory accuracy in explaining and predicting workers' safety behaviour. This finding provides the empirical evidence to support the usefulness of personality traits as effective predictors of people's safety behaviour at work. In addition, this study could have practical implications. The machine learning model developed can help identify vulnerable workers who are more prone to undertake unsafe behaviours, which is proven to have good prediction accuracy and is thereby potentially useful for decision making and safety management on construction sites.


Bayesian Variational Autoencoders for Unsupervised Out-of-Distribution Detection

arXiv.org Machine Learning

Despite their successes, deep neural networks still make unreliable predictions when faced with test data drawn from a distribution different to that of the training data, constituting a major problem for AI safety. While this motivated a recent surge in interest in developing methods to detect such out-of-distribution (OoD) inputs, a robust solution is still lacking. We propose a new probabilistic, unsupervised approach to this problem based on a Bayesian variational autoencoder model, which estimates a full posterior distribution over the decoder parameters using stochastic gradient Markov chain Monte Carlo, instead of fitting a point estimate. We describe how information-theoretic measures based on this posterior can then be used to detect OoD data both in input space as well as in the model's latent space. The effectiveness of our approach is empirically demonstrated.


Large-scale Kernel Methods and Applications to Lifelong Robot Learning

arXiv.org Machine Learning

As the size and richness of available datasets grow larger, the opportunities for solving increasingly challenging problems with algorithms learning directly from data grow at the same pace. Consequently, the capability of learning algorithms to work with large amounts of data has become a crucial scientific and technological challenge for their practical applicability. Hence, it is no surprise that large-scale learning is currently drawing plenty of research effort in the machine learning research community. In this thesis, we focus on kernel methods, a theoretically sound and effective class of learning algorithms yielding nonparametric estimators. Kernel methods, in their classical formulations, are accurate and efficient on datasets of limited size, but do not scale up in a cost-effective manner. Recent research has shown that approximate learning algorithms, for instance random subsampling methods like Nystr\"om and random features, with time-memory-accuracy trade-off mechanisms are more scalable alternatives. In this thesis, we provide analyses of the generalization properties and computational requirements of several types of such approximation schemes. In particular, we expose the tight relationship between statistics and computations, with the goal of tailoring the accuracy of the learning process to the available computational resources. Our results are supported by experimental evidence on large-scale datasets and numerical simulations. We also study how large-scale learning can be applied to enable accurate, efficient, and reactive lifelong learning for robotics. In particular, we propose algorithms allowing robots to learn continuously from experience and adapt to changes in their operational environment. The proposed methods are validated on the iCub humanoid robot in addition to other benchmarks.


Sampling for Bayesian Mixture Models: MCMC with Polynomial-Time Mixing

arXiv.org Machine Learning

Various researchers have studied posterior inference of parameters in Bayesian mixture models [24, 42, 23], so that the statistical behavior of such models is relatively well-understood. In contrast, much less is known about the efficiency of different algorithms for sampling from the posterior distributions that arise from Bayesian mixture models. A standard approach for doing so is via some form of Markov Chain Monte Carlo (MCMC). Many different types of MCMC algorithms have been introduced for various types of Bayesian mixture models, including finite Bayesian mixture models [21, 49, 50, 26, 40], Dirichlet process mixture models [37, 41, 25, 28], and hierarchical and nested Dirichlet process models [52, 47]. Despite the plethora of possible MCMC methods, upper bounds on their mixing times are often challenging to establish. We refer the reader to the papers [27, 3, 55, 48, 57] for non-asymptotic upper bounds on mixing times for certain types of Bayesian models, different from those studied in this paper. In recent years, it has been increasingly common in the Bayesian literature to make use of a fractional likelihood--meaning an ordinary likelihood raised to some fractional power. Combining such a fractional likelihood with a prior distribution in the usual way leads to a class of posteriors known as power posterior or fractional posterior distributions. The power posterior distributions have been shown to have attractive properties in terms of robustness to mis-specification in Bayesian mixture models [39], and have been used in various applications 1 arXiv:1912.05153v1


Scalable Bayesian Preference Learning for Crowds

arXiv.org Machine Learning

We propose a scalable Bayesian preference learning method for jointly predicting the preferences of individuals as well as the consensus of a crowd from pairwise labels. Peoples' opinions often differ greatly, making it difficult to predict their preferences from small amounts of personal data. Individual biases also make it harder to infer the consensus of a crowd when there are few labels per item. We address these challenges by combining matrix factorisation with Gaussian processes, using a Bayesian approach to account for uncertainty arising from noisy and sparse data. Our method exploits input features, such as text embeddings and user metadata, to predict preferences for new items and users that are not in the training set. As previous solutions based on Gaussian processes do not scale to large numbers of users, items or pairwise labels, we propose a stochastic variational inference approach that limits computational and memory costs. Our experiments on a recommendation task show that our method is competitive with previous approaches despite our scalable inference approximation. We demonstrate the method's scalability on a natural language processing task with thousands of users and items, and show improvements over the state of the art on this task. We make our software publicly available for future work.


Learn classification algorithms using Python and scikit-learn

#artificialintelligence

This tutorial is part of the Machine learning for developers learning path. In this tutorial, we describe the basics of solving a classification-based machine learning problem, and give you a comparative study of some of the current most popular algorithms. In the open Notebook, click Run to run the cells one at a time. The rest of the tutorial follows the order of the Notebook. Classification is when the feature to be predicted contains categories of values.