Goto

Collaborating Authors

 Uncertainty


Gaussian Processes for Dummies ·

#artificialintelligence

It always amazes me how I can hear a statement uttered in the space of a few seconds about some aspect of machine learning that then takes me countless hours to understand. I first heard about Gaussian Processes on an episode of the Talking Machines podcast and thought it sounded like a really neat idea. I promptly procured myself a copy of the classic text on the subject, Gaussian Processes for Machine Learning by Rasmussen and Williams, but my tenuous grasp on the Bayesian approach to machine learning meant I got stumped pretty quickly. That's when I began the journey I described in my last post, From both sides now: the math of linear regression. Gaussian Processes (GPs) are the natural next step in that journey as they provide an alternative approach to regression problems.


Softplus Regressions and Convex Polytopes

arXiv.org Machine Learning

To construct flexible nonlinear predictive distributions, the paper introduces a family of softplus function based regression models that convolve, stack, or combine both operations by convolving countably infinite stacked gamma distributions, whose scales depend on the covariates. Generalizing logistic regression that uses a single hyperplane to partition the covariate space into two halves, softplus regressions employ multiple hyperplanes to construct a confined space, related to a single convex polytope defined by the intersection of multiple half-spaces or a union of multiple convex polytopes, to separate one class from the other. The gamma process is introduced to support the convolution of countably infinite (stacked) covariate-dependent gamma distributions. For Bayesian inference, Gibbs sampling derived via novel data augmentation and marginalization techniques is used to deconvolve and/or demix the highly complex nonlinear predictive distribution. Example results demonstrate that softplus regressions provide flexible nonlinear decision boundaries, achieving classification accuracies comparable to that of kernel support vector machine while requiring significant less computation for out-of-sample prediction.


Under the Hood of the Variational Autoencoder (in Prose and Code)

#artificialintelligence

In Part I of this series, we introduced the theory and intuition behind the VAE, an exciting development in machine learning for combined generative modeling and inference--"machines that imagine and reason." To recap: VAEs put a probabilistic spin on the basic autoencoder paradigm--treating their inputs, hidden representations, and reconstructed outputs as probabilistic random variables within a directed graphical model. With this Bayesian perspective, the encoder becomes a variational inference network, mapping observed inputs to (approximate) posterior distributions over latent space, and the decoder becomes a generative network, capable of mapping arbitrary latent coordinates back to distributions over the original data space. The beauty of this setup is that we can take a principled Bayesian approach toward building systems with a rich internal "mental model" of the observed world, all by training a single, cleverly-designed deep neural network. These benefits derive from an enriched understanding of data as merely the tip of the iceberg--the observed result of an underlying causative probabilistic process.


Predictive modeling, supervised machine learning, and pattern classification

#artificialintelligence

A Support Vector Machine (SVM) is a classification method that samples hyperplanes which separate between two or multiple classes. Eventually, the hyperplane with the highest margin is retained, where "margin" is defined as the minimum distance from sample points to the hyperplane. The sample point(s) that form margin are called support vectors and establish the final SVM model. Bayes classifiers are based on a statistical model (i.e., Bayes theorem: calculating posterior probabilities based on the prior probability and the so-called likelihood). A Naive Bayes classifier assumes that all attributes are conditionally independent, thereby, computing the likelihood is simplified to the product of the conditional probabilities of observing individual attributes given a particular class label. Artificial Neural Networks (ANN) are graph-like classifiers that mimic the structure of a human or animal "brain" where the interconnected nodes represent the neurons. Decision tree classifiers are tree like graphs, where nodes in the graph test certain conditions on a particular set of features, and branches split the decision towards the leaf nodes. Leaves represent lowest level in the graph and determine the class labels. Optimal tree are trained by minimizing Gini impurity, or maximizing information gain.


Computational and Statistical Tradeoffs in Learning to Rank

arXiv.org Machine Learning

For massive and heterogeneous modern datasets, it is of fundamental interest to provide guarantees on the accuracy of estimation when computational resources are limited. In the application of learning to rank, we provide a hierarchy of rank-breaking mechanisms ordered by the complexity in thus generated sketch of the data. This allows the number of data points collected to be gracefully traded off against computational resources available, while guaranteeing the desired level of accuracy. Theoretical guarantees on the proposed generalized rank-breaking implicitly provide such trade-offs, which can be explicitly characterized under certain canonical scenarios on the structure of the data.


The Matrix Generalized Inverse Gaussian Distribution: Properties and Applications

arXiv.org Machine Learning

While the Matrix Generalized Inverse Gaussian ($\mathcal{MGIG}$) distribution arises naturally in some settings as a distribution over symmetric positive semi-definite matrices, certain key properties of the distribution and effective ways of sampling from the distribution have not been carefully studied. In this paper, we show that the $\mathcal{MGIG}$ is unimodal, and the mode can be obtained by solving an Algebraic Riccati Equation (ARE) equation [7]. Based on the property, we propose an importance sampling method for the $\mathcal{MGIG}$ where the mode of the proposal distribution matches that of the target. The proposed sampling method is more efficient than existing approaches [32, 33], which use proposal distributions that may have the mode far from the $\mathcal{MGIG}$'s mode. Further, we illustrate that the the posterior distribution in latent factor models, such as probabilistic matrix factorization (PMF) [25], when marginalized over one latent factor has the $\mathcal{MGIG}$ distribution. The characterization leads to a novel Collapsed Monte Carlo (CMC) inference algorithm for such latent factor models. We illustrate that CMC has a lower log loss or perplexity than MCMC, and needs fewer samples.


Cox process representation and inference for stochastic reaction-diffusion processes

arXiv.org Machine Learning

Complex behaviour in many systems arises from the stochastic interactions of spatially distributed particles or agents. Stochastic reaction-diffusion processes are widely used to model such behaviour in disciplines ranging from biology to the social sciences, yet they are notoriously difficult to simulate and calibrate to observational data. Here we use ideas from statistical physics and machine learning to provide a solution to the inverse problem of learning a stochastic reactiondiffusion process from data. Our solution relies on a nontrivial connection between stochastic reaction-diffusion processes and spatiotemporal Cox processes, a well-studied class of models from computational statistics. This connection leads to an efficient and flexible algorithm for parameter inference and model selection. Our approach shows excellent accuracy on numeric and real data examples from systems biology and epidemiology. Our work provides both insights into spatiotemporal stochastic systems, and a practical solution to a longstanding problem in computational modelling. Many complex behaviours in several disciplines originate from a common mechanism: the dynamics of locally interacting, spatially distributed agents. Examples arise at all spatial scales and in a wide range of scientific fields, from microscopic interactions of low-abundance molecules within cells, to ecological and epidemic phenomena at the continental scale. Frequently, stochasticity and spatial heterogeneity play a crucial role in determining the process dynamics and the emergence of collective behaviour [1]-[8]. Stochastic reaction-diffusion processes (SRDPs) constitute a convenient mathematical framework to model such systems. SRDPs were originally introduced in statistical physics [10, 11] to describe the collective behaviour of populations of point-wise agents performing Brownian diffusion in space and stochastically interacting with other, nearby agents according to predefined rules. The flexibility afforded by the local interaction rules has led to a wide application of SRDPs in many different scientific disciplines where complex spatiotemporal behaviours arise, from molecular biology [4, 9, 12], to ecology [13], to the social sciences [14]. Despite their popularity, SRDPs pose considerable challenges, as analytical computations are only possible for a handful of systems [8].


The Mathematics of Machine Learning

#artificialintelligence

In the last few months, I have had several people contact me about their enthusiasm for venturing into the world of data science and using Machine Learning (ML) techniques to probe statistical regularities and build impeccable data-driven products. However, I've observed that some actually lack the necessary mathematical intuition and framework to get useful results. This is the main reason I decided to write this blog post. Recently, there has been an upsurge in the availability of many easy-to-use machine and deep learning packages such as scikit-learn, Weka, Tensorflow etc. Machine Learning theory is a field that intersects statistical, probabilistic, computer science and algorithmic aspects arising from learning iteratively from data and finding hidden insights which can be used to build intelligent applications. Despite the immense possibilities of Machine and Deep Learning, a thorough mathematical understanding of many of these techniques is necessary for a good grasp of the inner workings of the algorithms and getting good results.


Spatial Modeling of Oil Exploration Areas Using Neural Networks and ANFIS in GIS

arXiv.org Machine Learning

Exploration of hydrocarbon resources is a highly complicated and expensive process where various geological, geochemical and geophysical factors are developed then combined together. It is highly significant how to design the seismic data acquisition survey and locate the exploratory wells since incorrect or imprecise locations lead to waste of time and money during the operation. The objective of this study is to locate high-potential oil and gas field in 1: 250,000 sheet of Ahwaz including 20 oil fields to reduce both time and costs in exploration and production processes. In this regard, 17 maps were developed using GIS functions for factors including: minimum and maximum of total organic carbon (TOC), yield potential for hydrocarbons production (PP), Tmax peak, production index (PI), oxygen index (OI), hydrogen index (HI) as well as presence or proximity to high residual Bouguer gravity anomalies, proximity to anticline axis and faults, topography and curvature maps obtained from Asmari Formation subsurface contours. To model and to integrate maps, this study employed artificial neural network and adaptive neuro-fuzzy inference system (ANFIS) methods. The results obtained from model validation demonstrated that the 17x10x5 neural network with R=0.8948, RMS=0.0267, and kappa=0.9079 can be trained better than other models such as ANFIS and predicts the potential areas more accurately. However, this method failed to predict some oil fields and wrongly predict some areas as potential zones.


String and Membrane Gaussian Processes

arXiv.org Machine Learning

In this paper we introduce a novel framework for making exact nonparametric Bayesian inference on latent functions, that is particularly suitable for Big Data tasks. Firstly, we introduce a class of stochastic processes we refer to as string Gaussian processes (string GPs), which are not to be mistaken for Gaussian processes operating on text. We construct string GPs so that their finite-dimensional marginals exhibit suitable local conditional independence structures, which allow for scalable, distributed, and flexible nonparametric Bayesian inference, without resorting to approximations, and while ensuring some mild global regularity constraints. Furthermore, string GP priors naturally cope with heterogeneous input data, and the gradient of the learned latent function is readily available for explanatory analysis. Secondly, we provide some theoretical results relating our approach to the standard GP paradigm. In particular, we prove that some string GPs are Gaussian processes, which provides a complementary global perspective on our framework. Finally, we derive a scalable and distributed MCMC scheme for supervised learning tasks under string GP priors. The proposed MCMC scheme has computational time complexity $\mathcal{O}(N)$ and memory requirement $\mathcal{O}(dN)$, where $N$ is the data size and $d$ the dimension of the input space. We illustrate the efficacy of the proposed approach on several synthetic and real-world datasets, including a dataset with $6$ millions input points and $8$ attributes.