# Uncertainty

### Modelling Latent Travel Behaviour Characteristics with Generative Machine Learning

In this paper, we implement an information-theoretic approach to travel behaviour analysis by introducing a generative modelling framework to identify informative latent characteristics in travel decision making. It involves developing a joint tri-partite Bayesian graphical network model using a Restricted Boltzmann Machine (RBM) generative modelling framework. We apply this framework on a mode choice survey data to identify abstract latent variables and compare the performance with a traditional latent variable model with specific latent preferences -- safety, comfort, and environmental. Data collected from a joint stated and revealed preference mode choice survey in Quebec, Canada were used to calibrate the RBM model. Results show that a signficant impact on model likelihood statistics and suggests that machine learning tools are highly suitable for modelling complex networks of conditional independent behaviour interactions.

### Alternate Estimation of a Classifier and the Class-Prior from Positive and Unlabeled Data

We consider a problem of learning a binary classifier only from positive data and unlabeled data (PU learning) and estimating the class-prior in unlabeled data under the case-control scenario. Most of the recent methods of PU learning require an estimate of the class-prior probability in unlabeled data, and it is estimated in advance with another method. However, such a two-step approach which first estimates the class prior and then trains a classifier may not be the optimal approach since the estimation error of the class-prior is not taken into account when a classifier is trained. In this paper, we propose a novel unified approach to estimating the class-prior and training a classifier alternately. Our proposed method is simple to implement and computationally efficient. Through experiments, we demonstrate the practical usefulness of the proposed method.

### Deep PDF: Probabilistic Surface Optimization and Density Estimation

A probability density function (pdf) encodes the entire stochastic knowledge about data distribution, where data may represent stochastic observations in robotics, transition state pairs in reinforcement learning or any other empirically acquired modality. Inferring data pdf is of prime importance, allowing to analyze various model hypotheses and perform smart decision making. However, most density estimation techniques are limited in their representation expressiveness to specific kernel type or predetermined distribution family, and have other restrictions. For example, kernel density estimation (KDE) methods require meticulous parameter search and are extremely slow at querying new points. In this paper we present a novel non-parametric density estimation approach, DeepPDF, that uses a neural network to approximate a target pdf given samples from thereof. Such a representation provides high inference accuracy for a wide range of target pdfs using a relatively simple network structure, making our method highly statistically robust. This is done via a new stochastic optimization algorithm, \emph{Probabilistic Surface Optimization} (PSO), that turns to advantage the stochastic nature of sample points in order to force network output to be identical to the output of a target pdf. Once trained, query point evaluation can be efficiently done in DeepPDF by a simple network forward pass, with linear complexity in the number of query points. Moreover, the PSO algorithm is capable of inferring the frequency of data samples and may also be used in other statistical tasks such as conditional estimation and distribution transformation. We compare the derived approach with KDE methods showing its superior performance and accuracy.

### Deterministic Inequalities for Smooth M-estimators

Ever since the proof of asymptotic normality of maximum likelihood estimator by Cramer (1946), it has been understood that a basic technique of the Taylor series expansion suffices for asymptotics of $M$-estimators with smooth/differentiable loss function. Although the Taylor series expansion is a purely deterministic tool, the realization that the asymptotic normality results can also be made deterministic (and so finite sample) received far less attention. With the advent of big data and high-dimensional statistics, the need for finite sample results has increased. In this paper, we use the (well-known) Banach fixed point theorem to derive various deterministic inequalities that lead to the classical results when studied under randomness. In addition, we provide applications of these deterministic inequalities for crossvalidation/subsampling, marginal screening and uniform-in-submodel results that are very useful for post-selection inference and in the study of post-regularization estimators. Our results apply to many classical estimators, in particular, generalized linear models, non-linear regression and cox proportional hazards model. Extensions to non-smooth and constrained problems are also discussed.

### Cluster Variational Approximations for Structure Learning of Continuous-Time Bayesian Networks from Incomplete Data

Continuous-time Bayesian networks (CTBNs) constitute a general and powerful framework for modeling continuous-time stochastic processes on networks. This makes them particularly attractive for learning the directed structures among interacting entities. However, if the available data is incomplete, one needs to simulate the prohibitively complex CTBN dynamics. Existing approximation techniques, such as sampling and low-order variational methods, either scale unfavorably in system size, or are unsatisfactory in terms of accuracy. Inspired by recent advances in statistical physics, we present a new approximation scheme based on cluster-variational methods significantly improving upon existing variational approximations. We can analytically marginalize the parameters of the approximate CTBN, as these are of secondary importance for structure learning. This recovers a scalable scheme for direct structure learning from incomplete and noisy time-series data. Our approach outperforms existing methods in terms of scalability.

### Bayesian sparse reconstruction: a brute-force approach to astronomical imaging and machine learning

We present a principled Bayesian framework for signal reconstruction, in which the signal is modelled by basis functions whose number (and form, if required) is determined by the data themselves. This approach is based on a Bayesian interpretation of conventional sparse reconstruction and regularisation techniques, in which sparsity is imposed through priors via Bayesian model selection. We demonstrate our method for noisy 1- and 2-dimensional signals, including astronomical images. Furthermore, by using a product-space approach, the number and type of basis functions can be treated as integer parameters and their posterior distributions sampled directly. We show that order-of-magnitude increases in computational efficiency are possible from this technique compared to calculating the Bayesian evidences separately, and that further computational gains are possible using it in combination with dynamic nested sampling. Our approach can be readily applied to neural networks, where it allows the network architecture to be determined by the data in a principled Bayesian manner by treating the number of nodes and hidden layers as parameters.

### The Inductive Bias of Restricted f-GANs

Generative adversarial networks are a novel method for statistical inference that have achieved much empirical success; however, the factors contributing to this success remain ill-understood. In this work, we attempt to analyze generative adversarial learning -- that is, statistical inference as the result of a game between a generator and a discriminator -- with the view of understanding how it differs from classical statistical inference solutions such as maximum likelihood inference and the method of moments. Specifically, we provide a theoretical characterization of the distribution inferred by a simple form of generative adversarial learning called restricted f-GANs -- where the discriminator is a function in a given function class, the distribution induced by the generator is restricted to lie in a pre-specified distribution class and the objective is similar to a variational form of the f-divergence. A consequence of our result is that for linear KL-GANs -- that is, when the discriminator is a linear function over some feature space and f corresponds to the KL-divergence -- the distribution induced by the optimal generator is neither the maximum likelihood nor the method of moments solution, but an interesting combination of both.

### Hyperprior Induced Unsupervised Disentanglement of Latent Representations

We address the problem of unsupervised disentanglement of latent representations learnt via deep generative models. In contrast to current approaches that operate on the evidence lower bound (ELBO), we argue that statistical independence in the latent space of VAEs can be enforced in a principled hierarchical Bayesian manner. To this effect, we augment the standard VAE with an inverse-Wishart (IW) prior on the covariance matrix of the latent code. By tuning the IW parameters, we are able to encourage (or discourage) independence in the learnt latent dimensions. Extensive experimental results on a range of datasets (2DShapes, 3DChairs, 3DFaces and CelebA) show our approach to outperform the $\beta$-VAE and is competitive with the state-of-the-art FactorVAE. Our approach achieves significantly better disentanglement and reconstruction on a new dataset (CorrelatedEllipses) which introduces correlations between the factors of variation.

### Probabilistic approach to limited-data computed tomography reconstruction

We consider the problem of reconstructing the internal structure of an object from limited x-ray projections. In this work, we use a Gaussian process to model the target function. In contrast to other established methods, this comes with the advantage of not requiring any manual parameter tuning, which usually arises in classical regularization strategies. The Gaussian process is well-known in a heavy computation for the inversion of a covariance matrix, and in this work, by employing an approximative spectral-based technique, we reduce the computational complexity and avoid the need of numerical integration. Results from simulated and real data indicate that this approach is less sensitive to streak artifacts as compared to the commonly used method of filteredback projection, an analytic reconstruction algorithm using Radon inversion formula.

### Change-Point Detection on Hierarchical Circadian Models

This paper addresses the problem of change-point detection on sequences of high-dimensional and heterogeneous observations, which also possess a periodic temporal structure. Due to the dimensionality problem, when the time between change-points is on the order of the dimension of the model parameters, drifts in the underlying distribution can be misidentified as changes. To overcome this limitation we assume that the observations lie in a lower dimensional manifold that admits a latent variable representation. In particular, we propose a hierarchical model that is computationally feasible, widely applicable to heterogeneous data and robust to missing instances. Additionally, to deal with the observations' periodic dependencies, we employ a circadian model where the data periodicity is captured by non-stationary covariance functions. We validate the proposed technique on synthetic examples and we demonstrate its utility in the detection of changes for human behavior characterization.