Bayesian Learning
Modelling Latent Travel Behaviour Characteristics with Generative Machine Learning
The increased use of psychological and perceptual variables in travel choice survey have motivated a number of studies that investigated the explicit effects of latent behaviour in decision-making. Analysis of travel mode choice has focused on the effects of modal travel cost, time or reliability and many recent studies have attributed latent behaviour variables to account for unobservable effects Paulssen et al. [2014], Bhat et al. [2015]. The Integrated Choice and Latent Variable (ICLV) model is a recent development in structural equation modelling (SEM) to handle hybrid endogenous and exogenous variables in decision-making Ben-Akiva et al. [2002]. The ICLV model has been shown - in some situations - to produce consistent estimates of model parameters, leading to better explanatory solutions Vij and Walker [2016]. The history of structural modelling dates back to the 1970s and have been originally used in psychology, sociology and market research, and recently it has seen growing applications in travel behaviour involving latent preference "attitudinal" variables and measurement "indicators".
Detecting and Explaining Drifts in Yearly Grant Applications
Pauwels, Stephen, Calders, Toon
During the lifetime of a Business Process changes can be made to the workflow, the required resources, required documents, . . . . Different traces from the same Business Process within a single log file can thus differ substantially due to these changes. We propose a method that is able to detect concept drift in multivariate log files with a dozen attributes. We test our approach on the BPI Challenge 2018 data con- sisting of applications for EU direct payment from farmers in Germany where we use it to detect Concept Drift. In contrast to other methods our algorithm does not require the manual selection of the features used to detect drift. Our method first creates a model that captures the re- lations between attributes and between events of different time steps. This model is then used to score every event and trace. These scores can be used to detect outlying cases and concept drift. Thanks to the decomposability of the score we are able to perform detailed root-cause analysis.
Cluster Variational Approximations for Structure Learning of Continuous-Time Bayesian Networks from Incomplete Data
Linzner, Dominik, Koeppl, Heinz
Continuous-time Bayesian networks (CTBNs) constitute a general and powerful framework for modeling continuous-time stochastic processes on networks. This makes them particularly attractive for learning the directed structures among interacting entities. However, if the available data is incomplete, one needs to simulate the prohibitively complex CTBN dynamics. Existing approximation techniques, such as sampling and low-order variational methods, either scale unfavorably in system size, or are unsatisfactory in terms of accuracy. Inspired by recent advances in statistical physics, we present a new approximation scheme based on cluster-variational methods significantly improving upon existing variational approximations. We can analytically marginalize the parameters of the approximate CTBN, as these are of secondary importance for structure learning. This recovers a scalable scheme for direct structure learning from incomplete and noisy time-series data. Our approach outperforms existing methods in terms of scalability.
Bayesian Structure Learning by Recursive Bootstrap
Rohekar, Raanan Y., Gurwicz, Yaniv, Nisimov, Shami, Koren, Guy, Novik, Gal
We address the problem of Bayesian structure learning for domains with hundreds of variables by employing non-parametric bootstrap, recursively. We propose a method that covers both model averaging and model selection in the same framework. The proposed method deals with the main weakness of constraint-based learning---sensitivity to errors in the independence tests---by a novel way of combining bootstrap with constraint-based learning. Essentially, we provide an algorithm for learning a tree, in which each node represents a scored CPDAG for a subset of variables and the level of the node corresponds to the maximal order of conditional independencies that are encoded in the graph. As higher order independencies are tested in deeper recursive calls, they benefit from more bootstrap samples, and therefore more resistant to the curse-of-dimensionality. Moreover, the re-use of stable low order independencies allows greater computational efficiency. We also provide an algorithm for sampling CPDAGs efficiently from their posterior given the learned tree. We empirically demonstrate that the proposed algorithm scales well to hundreds of variables, and learns better MAP models and more reliable causal relationships between variables, than other state-of-the-art-methods.
The Inductive Bias of Restricted f-GANs
Liu, Shuang, Chaudhuri, Kamalika
Generative adversarial networks are a novel method for statistical inference that have achieved much empirical success; however, the factors contributing to this success remain ill-understood. In this work, we attempt to analyze generative adversarial learning -- that is, statistical inference as the result of a game between a generator and a discriminator -- with the view of understanding how it differs from classical statistical inference solutions such as maximum likelihood inference and the method of moments. Specifically, we provide a theoretical characterization of the distribution inferred by a simple form of generative adversarial learning called restricted f-GANs -- where the discriminator is a function in a given function class, the distribution induced by the generator is restricted to lie in a pre-specified distribution class and the objective is similar to a variational form of the f-divergence. A consequence of our result is that for linear KL-GANs -- that is, when the discriminator is a linear function over some feature space and f corresponds to the KL-divergence -- the distribution induced by the optimal generator is neither the maximum likelihood nor the method of moments solution, but an interesting combination of both.
Bayesian sparse reconstruction: a brute-force approach to astronomical imaging and machine learning
Higson, Edward, Handley, Will, Hobson, Michael, Lasenby, Anthony
We present a principled Bayesian framework for signal reconstruction, in which the signal is modelled by basis functions whose number (and form, if required) is determined by the data themselves. This approach is based on a Bayesian interpretation of conventional sparse reconstruction and regularisation techniques, in which sparsity is imposed through priors via Bayesian model selection. We demonstrate our method for noisy 1- and 2-dimensional signals, including astronomical images. Furthermore, by using a product-space approach, the number and type of basis functions can be treated as integer parameters and their posterior distributions sampled directly. We show that order-of-magnitude increases in computational efficiency are possible from this technique compared to calculating the Bayesian evidences separately, and that further computational gains are possible using it in combination with dynamic nested sampling. Our approach can be readily applied to neural networks, where it allows the network architecture to be determined by the data in a principled Bayesian manner by treating the number of nodes and hidden layers as parameters.
Using the Tsetlin Machine to Learn Human-Interpretable Rules for High-Accuracy Text Categorization with Medical Applications
Berge, Geir Thore, Granmo, Ole-Christoffer, Tveit, Tor Oddbjรธrn, Goodwin, Morten, Jiao, Lei, Matheussen, Bernt Viggo
Medical applications challenge today's text categorization techniques by demanding both high accuracy and ease-of-interpretation. Although deep learning has provided a leap ahead in accuracy, this leap comes at the sacrifice of interpretability. To address this accuracy-interpretability challenge, we here introduce, for the first time, a text categorization approach that leverages the recently introduced Tsetlin Machine. In all brevity, we represent the terms of a text as propositional variables. From these, we capture categories using simple propositional formulae, such as: if "rash" and "reaction" and "penicillin" then Allergy. The Tsetlin Machine learns these formulae from a labelled text, utilizing conjunctive clauses to represent the particular facets of each category. Indeed, even the absence of terms (negated features) can be used for categorization purposes. Our empirical results are quite conclusive. The Tsetlin Machine either performs on par with or outperforms all of the evaluated methods on both the 20 Newsgroups and IMDb datasets, as well as on a non-public clinical dataset. On average, the Tsetlin Machine delivers the best recall and precision scores across the datasets. The GPU implementation of the Tsetlin Machine is further 8 times faster than the GPU implementation of the neural network. We thus believe that our novel approach can have a significant impact on a wide range of text analysis applications, forming a promising starting point for deeper natural language understanding with the Tsetlin Machine.
Change-Point Detection on Hierarchical Circadian Models
Moreno-Muรฑoz, Pablo, Ramรญrez, David, Artรฉs-Rodrรญguez, Antonio
This paper addresses the problem of change-point detection on sequences of high-dimensional and heterogeneous observations, which also possess a periodic temporal structure. Due to the dimensionality problem, when the time between change-points is on the order of the dimension of the model parameters, drifts in the underlying distribution can be misidentified as changes. To overcome this limitation we assume that the observations lie in a lower dimensional manifold that admits a latent variable representation. In particular, we propose a hierarchical model that is computationally feasible, widely applicable to heterogeneous data and robust to missing instances. Additionally, to deal with the observations' periodic dependencies, we employ a circadian model where the data periodicity is captured by non-stationary covariance functions. We validate the proposed technique on synthetic examples and we demonstrate its utility in the detection of changes for human behavior characterization.
Endowing Robots with Longer-term Autonomy by Recovering from External Disturbances in Manipulation through Grounded Anomaly Classification and Recovery Policies
Wu, Hongmin, Luo, Shuangqi, Chen, Longxin, Duan, Shuangda, Chumkamon, Sakmongkon, Liu, Dong, Guan, Yisheng, Rojas, Juan
Robot manipulation is increasingly poised to interact with humans in co-shared workspaces. Despite increasingly robust manipulation and control algorithms, failure modes continue to exist whenever models do not capture the dynamics of the unstructured environment. To obtain longer-term horizons in robot automation, robots must develop introspection and recovery abilities. We contribute a set of recovery policies to deal with anomalies produced by external disturbances as well as anomaly classification through the use of non-parametric statistics with memoized variational inference with scalable adaptation. A recovery critic stands atop of a tightly-integrated, graph-based online motion-generation and introspection system that resolves a wide range of anomalous situations. Policies, skills, and introspection models are learned incrementally and contextually in a task. Two task-level recovery policies: re-enactment and adaptation resolve accidental and persistent anomalies respectively. The introspection system uses non-parametric priors along with Markov jump linear systems and memoized variational inference with scalable adaptation to learn a model from the data. Extensive real-robot experimentation with various strenuous anomalous conditions is induced and resolved at different phases of a task and in different combinations. The system executes around-the-clock introspection and recovery and even elicited self-recovery when misclassifications occurred.
Addressing the Fundamental Tension of PCGML with Discriminative Learning
Abstract--Procedural content generation via machine learning (PCGML) is typically framed as the task of fitting a generative model to full-scale examples of a desired content distribution. This approach presents a fundamental tension: the more design effort expended to produce detailed training examples for shaping a generator, the lower the return on investment from applying PCGML in the first place. In response, we propose the use of discriminative models (which capture the validity of a design rather the distribution of the content) trained on positive and negative examples. Through a modest modification of WaveFunctionCollapse, a commercially-adopted PCG approach that we characterize as using elementary machine learning, we demonstrate a new mode of control for learning-based generators. We demonstrate how an artist might craft a focused set of additional positive and negative examples by critique of the generator's previous outputs. This interaction mode bridges PCGML with mixed-initiative design assistance tools by working with a machine to define a space of valid designs rather than just one new design. Procedural Content Generation via Machine Learning (PCGML) is the recent term for the strategy of controlling content generators using examples [1]. Existing PCGML approaches train their statistical models based on preexisting artist-provided samples of the desired content. However, there is a fundamental tension here: machine learning often works better with more training data, but the effort to produce quality training data is frequently costly enough that the artists might be better off just making the content themselves. Rather than attempting to train a generative statistical model (capturing the distribution of desired content), we focus on applying discriminative learning. In discriminative learning, the model learns to judge whether a candidate content artifact would be valid or desirable, but it does not learn how to generate candidates. Pairing a discriminative model with a preexisting content generator, we realize example-driven generation that can be influenced by both positive and negative examples of valid design patterns.