Goto

Collaborating Authors

 Williams, Christopher K. I.


Naive Bayes Classifiers and One-hot Encoding of Categorical Variables

arXiv.org Machine Learning

This paper investigates the consequences of encoding a $K$-valued categorical variable incorrectly as $K$ bits via one-hot encoding, when using a Na\"{\i}ve Bayes classifier. This gives rise to a product-of-Bernoullis (PoB) assumption, rather than the correct categorical Na\"{\i}ve Bayes classifier. The differences between the two classifiers are analysed mathematically and experimentally. In our experiments using probability vectors drawn from a Dirichlet distribution, the two classifiers are found to agree on the maximum a posteriori class label for most cases, although the posterior probabilities are usually greater for the PoB case.


Of Mice and Mates: Automated Classification and Modelling of Mouse Behaviour in Groups using a Single Model across Cages

arXiv.org Artificial Intelligence

Behavioural experiments often happen in specialised arenas, but this may confound the analysis. To address this issue, we provide tools to study mice in the homecage environment, equipping biologists with the possibility to capture the temporal aspect of the individual's behaviour and model the interaction and interdependence between cage-mates with minimal human intervention. We develop the Activity Labelling Module (ALM) to automatically classify mouse behaviour from video, and a novel Group Behaviour Model (GBM) for summarising their joint behaviour across cages, using a permutation matrix to match the mouse identities in each cage to the model. We also release two datasets, ABODe for training behaviour classifiers and IMADGE for modelling behaviour.


On Suspicious Coincidences and Pointwise Mutual Information

arXiv.org Artificial Intelligence

Barlow (1985) hypothesized that the co-occurrence of two events $A$ and $B$ is "suspicious" if $P(A,B) \gg P(A) P(B)$. We first review classical measures of association for $2 \times 2$ contingency tables, including Yule's $Y$ (Yule, 1912), which depends only on the odds ratio $\lambda$, and is independent of the marginal probabilities of the table. We then discuss the mutual information (MI) and pointwise mutual information (PMI), which depend on the ratio $P(A,B)/P(A)P(B)$, as measures of association. We show that, once the effect of the marginals is removed, MI and PMI behave similarly to $Y$ as functions of $\lambda$. The pointwise mutual information is used extensively in some research communities for flagging suspicious coincidences, but it is important to bear in mind the sensitivity of the PMI to the marginals, with increased scores for sparser events.


Inference and Learning for Generative Capsule Models

arXiv.org Artificial Intelligence

Capsule networks (see e.g. Hinton et al., 2018) aim to encode knowledge of and reason about the relationship between an object and its parts. In this paper we specify a generative model for such data, and derive a variational algorithm for inferring the transformation of each model object in a scene, and the assignments of observed parts to the objects. We derive a learning algorithm for the object models, based on variational expectation maximization (Jordan et al., 1999). We also study an alternative inference algorithm based on the RANSAC method of Fischler and Bolles (1981). We apply these inference methods to (i) data generated from multiple geometric objects like squares and triangles ("constellations"), and (ii) data from a parts-based model of faces. Recent work by Kosiorek et al. (2019) has used amortized inference via stacked capsule autoencoders (SCAEs) to tackle this problem -- our results show that we significantly outperform them where we can make comparisons (on the constellations data).


Source-Free Adaptation to Measurement Shift via Bottom-Up Feature Restoration

arXiv.org Artificial Intelligence

Source-free domain adaptation (SFDA) aims to adapt a model trained on labelled data in a source domain to unlabelled data in a target domain without access to the source-domain data during adaptation. Existing methods for SFDA leverage entropy-minimization techniques which: (i) apply only to classification; (ii) destroy model calibration; and (iii) rely on the source model achieving a good level of feature-space class-separation in the target domain. We address these issues for a particularly pervasive type of domain shift called measurement shift, characterized by a change in measurement system (e.g. a change in sensor or lighting). In the source domain, we store a lightweight and flexible approximation of the feature distribution under the source data. In the target domain, we adapt the feature-extractor such that the approximate feature distribution under the target data realigns with that saved on the source. We call this method Feature Restoration (FR) as it seeks to extract features with the same semantics from the target domain as were previously extracted from the source. We additionally propose Bottom-Up Feature Restoration (BUFR), a bottom-up training scheme for FR which boosts performance by preserving learnt structure in the later layers of a network. Through experiments we demonstrate that BUFR often outperforms existing SFDA methods in terms of accuracy, calibration, and data efficiency, while being less reliant on the performance of the source model in the target domain.


On Memorization in Probabilistic Deep Generative Models

arXiv.org Machine Learning

In the last few years there have been incredible successes in generative modeling through the development of deep learning techniques such as variational autoencoders (VAEs) [1, 2], generative adversarial networks (GANs) [3], normalizing flows [4, 5], and diffusion networks [6], among others. The goal of generative modeling is to learn the data distribution of a given data set, which has numerous applications such as creating realistic synthetic data, correcting data corruption, and detecting anomalies. Novel architectures for generative modeling are typically evaluated on how well a complex, high dimensional data distribution can be learned by the model and how realistic the samples from the model are. An important question in the evaluation of generative models is to what extent observations from the training data are memorized by the learning algorithm. A common technique to assess memorization in deep generative models is to look for nearest neighbors. Typically, several samples are drawn from a trained model and compared to their nearest neighbors in the training set. There are several problems with this approach. First, it has been well established that when using the Euclidean metric this test can be easily fooled by taking an image from the training set and shifting it by a few pixels [7]. For this reason, nearest neighbors in the feature space of a secondary model are sometimes used, as well as cropping and/or downsampling before identifying nearest neighbors (e.g.


VAEs in the Presence of Missing Data

arXiv.org Machine Learning

Real world datasets often contain entries with Existing approaches which adapt VAEs to datasets with missing elements e.g. in a medical dataset, a patient missing data (Vedantam et al., 2017; Nazabal et al., 2018; is unlikely to have taken all possible diagnostic Mattei & Frellsen, 2019; Ma et al., 2019) suffer from a number tests. Variational Autoencoders (VAEs) are of significant disadvantages, including 1) not handling popular generative models often used for unsupervised missing not at random (MNAR) data, 2) replacing missing learning. Despite their widespread use elements with zeros with no way to distinguish an observed it is unclear how best to apply VAEs to datasets data element with value zero from a missing element, 3) with missing data. We develop a novel latent not scaling to high dimensional inputs and/or 4) restricting variable model of a corruption process which the types of neural network architectures permitted, these generates missing data, and derive a corresponding issues are discussed in detail below. We aim to improve tractable evidence lower bound (ELBO). Our upon the handling of missing data by VAEs by addressing model is straightforward to implement, can handle the disadvantages of the existing approaches. In particular both missing completely at random (MCAR) and we propose a novel latent variable probabilistic model of missing not at random (MNAR) data, scales to missing data as the result of a corruption process, and derive high dimensional inputs and gives both the VAE a tractable ELBO for our proposed model.


Customizing Sequence Generation with Multi-Task Dynamical Systems

arXiv.org Machine Learning

Dynamical system models (including RNNs) often lack the ability to adapt the sequence generation or prediction to a given context, limiting their real-world application. In this paper we show that hierarchical multi-task dynamical systems (MTDSs) provide direct user control over sequence generation, via use of a latent code $\mathbf{z}$ that specifies the customization to the individual data sequence. This enables style transfer, interpolation and morphing within generated sequences. We show the MTDS can improve predictions via latent code interpolation, and avoid the long-term performance degradation of standard RNN approaches.


Robust Variational Autoencoders for Outlier Detection in Mixed-Type Data

arXiv.org Machine Learning

We focus on the problem of unsupervised cell outlier detection in mixed type tabular datasets. Traditional methods for outlier detection are concerned only on detecting which rows in the dataset are outliers. However, identifying which cells in the dataset corrupt a specific row is an important problem in practice, especially in high-dimensional tables. We introduce the Robust Variational Autoencoder (RVAE), a deep generative model that learns the joint distribution of the clean data while identifying the outlier cells in the dataset. RVAE learns the probability of each cell in the dataset being an outlier, balancing the contributions of the different likelihood models in the row outlier score, making the method suitable for outlier detection in mixed type datasets. We show experimentally that the RVAE performs better than several state of the art methods in cell outlier detection for tabular datasets, while providing comparable or better results for row outlier detection.


The Extended Dawid-Skene Model: Fusing Information from Multiple Data Schemas

arXiv.org Machine Learning

While label fusion from multiple noisy annotations is a well understood concept in data wrangling (tackled for example by the Dawid-Skene (DS) model), we consider the extended problem of carrying out learning when the labels themselves are not consistently annotated with the same schema. We show that even if annotators use disparate, albeit related, label-sets, we can still draw inferences for the underlying full label-set. We propose the Inter-Schema AdapteR (ISAR) to translate the fully-specified label-set to the one used by each annotator, enabling learning under such heterogeneous schemas, without the need to re-annotate the data. We apply our method to a mouse behavioural dataset, achieving significant gains (compared with DS) in out-of-sample log-likelihood (-3.40 to -2.39) and F1-score (0.785 to 0.864).