Bayesian Inference
Online Structural Change-point Detection of High-dimensional Streaming Data via Dynamic Sparse Subspace Learning
Xu, Ruiyu, Wu, Jianguo, Yue, Xiaowei, Li, Yongxiang
High-dimensional streaming data are becoming increasingly ubiquitous in many fields. They often lie in multiple low-dimensional subspaces, and the manifold structures may change abruptly on the time scale due to pattern shift or occurrence of anomalies. However, the problem of detecting the structural changes in a real-time manner has not been well studied. To fill this gap, we propose a dynamic sparse subspace learning (DSSL) approach for online structural change-point detection of high-dimensional streaming data. A novel multiple structural change-point model is proposed and it is shown to be equivalent to maximizing a posterior under certain conditions. The asymptotic properties of the estimators are investigated. The penalty coefficients in our model can be selected by AMDL criterion based on some historical data. An efficient Pruned Exact Linear Time (PELT) based method is proposed for online optimization and change-point detection. The effectiveness of the proposed method is demonstrated through a simulation study and a real case study using gesture data for motion tracking.
Quantile Surfaces -- Generalizing Quantile Regression to Multivariate Targets
Bieshaar, Maarten, Schreiber, Jens, Vogt, Stephan, Gensler, André, Sick, Bernhard
In this article, we present a novel approach to multivariate probabilistic forecasting. Our approach is based on an extension of single-output quantile regression (QR) to multivariate-targets, called quantile surfaces (QS). QS uses a simple yet compelling idea of indexing observations of a probabilistic forecast through direction and vector length to estimate a central tendency. We extend the single-output QR technique to multivariate probabilistic targets. QS efficiently models dependencies in multivariate target variables and represents probability distributions through discrete quantile levels. Therefore, we present a novel two-stage process. In the first stage, we perform a deterministic point forecast (i.e., central tendency estimation). Subsequently, we model the prediction uncertainty using QS involving neural networks called quantile surface regression neural networks (QSNN). Additionally, we introduce new methods for efficient and straightforward evaluation of the reliability and sharpness of the issued probabilistic QS predictions. We complement this by the directional extension of the Continuous Ranked Probability Score (CRPS) score. Finally, we evaluate our novel approach on synthetic data and two currently researched real-world challenges in two different domains: First, probabilistic forecasting for renewable energy power generation, second, short-term cyclists trajectory forecasting for autonomously driving vehicles. Especially for the latter, our empirical results show that even a simple one-layer QSNN outperforms traditional parametric multivariate forecasting techniques, thus improving the state-of-the-art performance.
A Unifying Review of Deep and Shallow Anomaly Detection
Ruff, Lukas, Kauffmann, Jacob R., Vandermeulen, Robert A., Montavon, Grégoire, Samek, Wojciech, Kloft, Marius, Dietterich, Thomas G., Müller, Klaus-Robert
Deep learning approaches to anomaly detection have recently improved the state of the art in detection performance on complex datasets such as large collections of images or text. These results have sparked a renewed interest in the anomaly detection problem and led to the introduction of a great variety of new methods. With the emergence of numerous such methods, including approaches based on generative models, one-class classification, and reconstruction, there is a growing need to bring methods of this field into a systematic and unified perspective. In this review we aim to identify the common underlying principles as well as the assumptions that are often made implicitly by various methods. In particular, we draw connections between classic 'shallow' and novel deep approaches and show how this relation might cross-fertilize or extend both directions. We further provide an empirical assessment of major existing methods that is enriched by the use of recent explainability techniques, and present specific worked-through examples together with practical advice. Finally, we outline critical open challenges and identify specific paths for future research in anomaly detection.
CASTLE: Regularization via Auxiliary Causal Graph Discovery
Kyono, Trent, Zhang, Yao, van der Schaar, Mihaela
Regularization improves generalization of supervised models to out-of-sample data. Prior works have shown that prediction in the causal direction (effect from cause) results in lower testing error than the anti-causal direction. However, existing regularization methods are agnostic of causality. We introduce Causal Structure Learning (CASTLE) regularization and propose to regularize a neural network by jointly learning the causal relationships between variables. CASTLE learns the causal directed acyclical graph (DAG) as an adjacency matrix embedded in the neural network's input layers, thereby facilitating the discovery of optimal predictors. Furthermore, CASTLE efficiently reconstructs only the features in the causal DAG that have a causal neighbor, whereas reconstruction-based regularizers suboptimally reconstruct all input features. We provide a theoretical generalization bound for our approach and conduct experiments on a plethora of synthetic and real publicly available datasets demonstrating that CASTLE consistently leads to better out-of-sample predictions as compared to other popular benchmark regularizers.
Learning an arbitrary mixture of two multinomial logits
In this paper, we consider mixtures of multinomial logistic models (MNL), which are known to $\epsilon$-approximate any random utility model. Despite its long history and broad use, rigorous results are only available for learning a uniform mixture of two MNLs. Continuing this line of research, we study the problem of learning an arbitrary mixture of two MNLs. We show that the identifiability of the mixture models may only fail on an algebraic variety of a negligible measure. This is done by reducing the problem of learning a mixture of two MNLs to the problem of solving a system of univariate quartic equations. We also devise an algorithm to learn any mixture of two MNLs using a polynomial number of samples and a linear number of queries, provided that a mixture of two MNLs over some finite universe is identifiable. Several numerical experiments and conjectures are also presented.
Improved High Dimensional Discrete Bayesian Network Inference using Triplet Region Construction
Lin, Peng ( Capital University of Economics and Business) | Neil, Martin | Fenton, Norman
Performing efficient inference on high dimensional discrete Bayesian Networks (BNs) is challenging. When using exact inference methods the space complexity can grow exponentially with the tree-width, thus making computation intractable. This paper presents a general purpose approximate inference algorithm, based on a new region belief approximation method, called Triplet Region Construction (TRC). TRC reduces the cluster space complexity for factorized models from worst-case exponential to polynomial by performing graph factorization and producing clusters of limited size. Unlike previous generations of region-based algorithms, TRC is guaranteed to converge and effectively addresses the region choice problem that bedevils other region-based algorithms used for BN inference. Our experiments demonstrate that it also achieves significantly more accurate results than competing algorithms.
Learning Optimal Representations with the Decodable Information Bottleneck
Dubois, Yann, Kiela, Douwe, Schwab, David J., Vedantam, Ramakrishna
We address the question of characterizing and finding optimal representations for supervised learning. Traditionally, this question has been tackled using the Information Bottleneck, which compresses the inputs while retaining information about the targets, in a decoder-agnostic fashion. In machine learning, however, our goal is not compression but rather generalization, which is intimately linked to the predictive family or decoder of interest (e.g. linear classifier). We propose the Decodable Information Bottleneck (DIB) that considers information retention and compression from the perspective of the desired predictive family. As a result, DIB gives rise to representations that are optimal in terms of expected test performance and can be estimated with guarantees. Empirically, we show that the framework can be used to enforce a small generalization gap on downstream classifiers and to predict the generalization ability of neural networks.
Bayesian Restoration of Audio Degraded by Low-Frequency Pulses Modeled via Gaussian Process
de Carvalho, Hugo Tremonte, Ávila, Flávio Rainho, Biscainho, Luiz Wagner Pereira
A common defect found when reproducing old vinyl and gramophone recordings with mechanical devices are the long pulses with significant low-frequency content caused by the interaction of the arm-needle system with deep scratches or even breakages on the media surface. Previous approaches to their suppression on digital counterparts of the recordings depend on a prior estimation of the pulse location, usually performed via heuristic methods. This paper proposes a novel Bayesian approach capable of jointly estimating the pulse location; interpolating the almost annihilated signal underlying the strong discontinuity that initiates the pulse; and also estimating the long pulse tail by a simple Gaussian Process, allowing its suppression from the corrupted signal. The posterior distribution for the model parameters as well for the pulse is explored via Markov-Chain Monte Carlo (MCMC) algorithms. Controlled experiments indicate that the proposed method, while requiring significantly less user intervention, achieves perceptual results similar to those of previous approaches and performs well when dealing with naturally degraded signals.
An Intuitive Tutorial to Gaussian Processes Regression
This introduction aims to provide readers an intuitive understanding of Gaussian processes regression. Gaussian processes regression (GPR) models have been widely used in machine learning applications because their representation flexibility and inherently uncertainty measures over predictions. The paper starts with explaining mathematical basics that Gaussian processes built on including multivariate normal distribution, kernels, non-parametric models, joint and conditional probability. The Gaussian processes regression is then described in an accessible way by balancing showing unnecessary mathematical derivation steps and missing key conclusive results. An illustrative implementation of a standard Gaussian processes regression algorithm is provided. Beyond the standard Gaussian processes regression, existing software packages to implement state-of-the-art Gaussian processes algorithms are reviewed. Lastly, more advanced Gaussian processes regression models are specified. The paper is written in an accessible way, thus undergraduate science and engineering background will find no difficulties in following the content.
Why have a Unified Predictive Uncertainty? Disentangling it using Deep Split Ensembles
Sarawgi, Utkarsh, Zulfikar, Wazeer, Khincha, Rishab, Maes, Pattie
Understanding and quantifying uncertainty in black box Neural Networks (NNs) is critical when deployed in real-world settings such as healthcare. Recent works using Bayesian and non-Bayesian methods have shown how a unified predictive uncertainty can be modelled for NNs. Decomposing this uncertainty to disentangle the granular sources of heteroscedasticity in data provides rich information about its underlying causes. We propose a conceptually simple non-Bayesian approach, deep split ensemble, to disentangle the predictive uncertainties using a multivariate Gaussian mixture model. The NNs are trained with clusters of input features, for uncertainty estimates per cluster. We evaluate our approach on a series of benchmark regression datasets, while also comparing with unified uncertainty methods. Extensive analyses using dataset shits and empirical rule highlight our inherently well-calibrated models. Our work further demonstrates its applicability in a multi-modal setting using a benchmark Alzheimer's dataset and also shows how deep split ensembles can highlight hidden modality-specific biases. The minimal changes required to NNs and the training procedure, and the high flexibility to group features into clusters makes it readily deployable and useful. The source code is available at https://github.com/wazeerzulfikar/deep-split-ensembles