Bock, Christian
Online Time Series Anomaly Detection with State Space Gaussian Processes
Bock, Christian, Aubet, François-Xavier, Gasthaus, Jan, Kan, Andrey, Chen, Ming, Callot, Laurent
We propose r-ssGPFA, an unsupervised online anomaly detection model for uni- and multivariate time series building on the efficient state space formulation of Gaussian processes. For high-dimensional time series, we propose an extension of Gaussian process factor analysis to identify the common latent processes of the time series, allowing us to detect anomalies efficiently in an interpretable manner. We gain explainability while speeding up computations by imposing an orthogonality constraint on the mapping from the latent to the observed. Our model's robustness is improved by using a simple heuristic to skip Kalman updates when encountering anomalous observations. We investigate the behaviour of our model on synthetic data and show on standard benchmark datasets that our method is competitive with state-of-the-art methods while being computationally cheaper.
Uncovering the Topology of Time-Varying fMRI Data using Cubical Persistence
Rieck, Bastian, Yates, Tristan, Bock, Christian, Borgwardt, Karsten, Wolf, Guy, Turk-Browne, Nicholas, Krishnaswamy, Smita
Functional magnetic resonance imaging (fMRI) is a crucial technology for gaining insights into cognitive processes in humans. Data amassed from fMRI measurements result in volumetric data sets that vary over time. However, analysing such data presents a challenge due to the large degree of noise and person-to-person variation in how information is represented in the brain. To address this challenge, we present a novel topological approach that encodes each time point in an fMRI data set as a persistence diagram of topological features, i.e. high-dimensional voids present in the data. This representation naturally does not rely on voxel-by-voxel correspondence and is robust to noise. We show that these time-varying persistence diagrams can be clustered to find meaningful groupings between participants, and that they are also useful in studying within-subject brain state trajectories of subjects performing a particular task. Here, we apply both clustering and trajectory analysis techniques to a group of participants watching the movie 'Partly Cloudy'. We observe significant differences in both brain state trajectories and overall topological activity between adults and children watching the same movie.
Path Imputation Strategies for Signature Models of Irregular Time Series
Moor, Michael, Horn, Max, Bock, Christian, Borgwardt, Karsten, Rieck, Bastian
Originally described by Chen [5, 6, 7] and popularised in the theory of rough paths and controlled differential equations [14, 31, 32], the signature transform, also known as the path signature or simply signature, acts on a continuous vector-valued path of bounded variation, and returns a graded sequence of statistics, which determine a path up to a negligible equivalence class. Moreover, every continuous function of a path can be recovered by applying a linear transform to this collection of statistics [3, Proposition A.6]. This'universal nonlinearity' property makes the signature a promising nonparametric feature extractor in both generative and discriminative learning scenarios. Further properties include the signature's uniqueness [20], as well as factorial decay of its higher order terms [32]. These theoretical foundations have been accompanied by outstanding empirical results when applying signatures to clinical time series classification tasks [34, 40]. Due to their similarities, we may hope that tools that apply to continuous paths can also be applied to multivariate time series. But since multivariate time series are not continuous paths, one first needs to construct a continuous path before signature techniques are applicable. Previous work [3, 12, 27] characterised this construction as an embedding problem, and typically considered it a minor technical detail. This is exacerbated by the--perfectly sensible--behaviour of software for computing the signature [22, 39], which commonly considers a continuous piecewise linear path as an input, described by its sequence of knots, i.e. values.
Set Functions for Time Series
Horn, Max, Moor, Michael, Bock, Christian, Rieck, Bastian, Borgwardt, Karsten
Nevertheless, in many application domains, in particular healthcare (Y adav et al., 2018), measurements might not necessarily be observed at a regular rate or could be misaligned. Moreover, the presence or absence of a measurement and its observation frequency may carry information of its own (Little & Rubin, 2014), such that imputing the missing values is not always desired. While some algorithms can be readily applied to datasets with varying length, these methods usually assume regular sampling of the data and/or require the measurements across modalities to be aligned/synchronized, preventing their application to the aforementioned settings. Existing approaches for unaligned measurements, by contrast, typically rely on imputation to obtain a regularly-sampled version of a data set for classification. Learning a suitable imputation scheme, however, requires understanding the underlying dynamics of a system; this task is significantly more complicated and not necessarily required when classification is the main goal. Furthermore, even though a decoupled imputation scheme followed by classification is generally more scalable, it may lose information that is relevant for prediction tasks. Approaches that jointly optimize both tasks add a large computational overhead, thus suffering from poor scalability or high memory requirements. Our method is motivated by the understanding that, while RNNs and similar architectures are well suited for capturing and modelling the dynamics of a time series and thus excel at tasks such as forecasting, retaining the order of an input sequence can even be a disadvantage in classification scen-1 arXiv:1909.12064v1
Machine learning for early prediction of circulatory failure in the intensive care unit
Hyland, Stephanie L., Faltys, Martin, Hüser, Matthias, Lyu, Xinrui, Gumbsch, Thomas, Esteban, Cristóbal, Bock, Christian, Horn, Max, Moor, Michael, Rieck, Bastian, Zimmermann, Marc, Bodenham, Dean, Borgwardt, Karsten, Rätsch, Gunnar, Merz, Tobias M.
Intensive care clinicians are presented with large quantities of patient information and measurements from a multitude of monitoring systems. The limited ability of humans to process such complex information hinders physicians to readily recognize and act on early signs of patient deterioration. We used machine learning to develop an early warning system for circulatory failure based on a high-resolution ICU database with 240 patient years of data. This automatic system predicts 90.0% of circulatory failure events (prevalence 3.1%), with 81.8% identified more than two hours in advance, resulting in an area under the receiver operating characteristic curve of 94.0% and area under the precision-recall curve of 63.0%. The model was externally validated in a large independent patient cohort.
Neural Persistence: A Complexity Measure for Deep Neural Networks Using Algebraic Topology
Rieck, Bastian, Togninalli, Matteo, Bock, Christian, Moor, Michael, Horn, Max, Gumbsch, Thomas, Borgwardt, Karsten
While many approaches to make neural networks more fathomable have been proposed, they are restricted to interrogating the network with input data. Measures for characterizing and monitoring structural properties, however, have not been developed. In this work, we propose neural persistence, a complexity measure for neural network architectures based on topological data analysis on weighted stratified graphs. To demonstrate the usefulness of our approach, we show that neural persistence reflects best practices developed in the deep learning community such as dropout and batch normalization. Moreover, we derive a neural persistence-based stopping criterion that shortens the training process while achieving comparable accuracies as early stopping based on validation loss.