Goto

Collaborating Authors

 Bayesian Inference


Testing and Improving the Robustness of Amortized Bayesian Inference for Cognitive Models

arXiv.org Machine Learning

Contaminant observations and outliers often cause problems when estimating the parameters of cognitive models, which are statistical models representing cognitive processes. In this study, we test and improve the robustness of parameter estimation using amortized Bayesian inference (ABI) with neural networks. To this end, we conduct systematic analyses on a toy example and analyze both synthetic and real data using a popular cognitive model, the Drift Diffusion Models (DDM). First, we study the sensitivity of ABI to contaminants with tools from robust statistics: the empirical influence function and the breakdown point. Next, we propose a data augmentation or noise injection approach that incorporates a contamination distribution into the data-generating process during training. We examine several candidate distributions and evaluate their performance and cost in terms of accuracy and efficiency loss relative to a standard estimator. Introducing contaminants from a Cauchy distribution during training considerably increases the robustness of the neural density estimator as measured by bounded influence functions and a much higher breakdown point. Overall, the proposed method is straightforward and practical to implement and has a broad applicability in fields where outlier detection or removal is challenging.


Graph Structure Refinement with Energy-based Contrastive Learning

arXiv.org Artificial Intelligence

Graph Neural Networks (GNNs) have recently gained widespread attention as a successful tool for analyzing graph-structured data. However, imperfect graph structure with noisy links lacks enough robustness and may damage graph representations, therefore limiting the GNNs' performance in practical tasks. Moreover, existing generative architectures fail to fit discriminative graph-related tasks. To tackle these issues, we introduce an unsupervised method based on a joint of generative training and discriminative training to learn graph structure and representation, aiming to improve the discriminative performance of generative models. We propose an Energy-based Contrastive Learning (ECL) guided Graph Structure Refinement (GSR) framework, denoted as ECL-GSR. To our knowledge, this is the first work to combine energy-based models with contrastive learning for GSR. Specifically, we leverage ECL to approximate the joint distribution of sample pairs, which increases the similarity between representations of positive pairs while reducing the similarity between negative ones. Refined structure is produced by augmenting and removing edges according to the similarity metrics among node representations. Extensive experiments demonstrate that ECL-GSR outperforms the state-of-the-art on eight benchmark datasets in node classification. ECL-GSR achieves faster training with fewer samples and memories against the leading baseline, highlighting its simplicity and efficiency in downstream tasks.


An Anomaly Detection System Based on Generative Classifiers for Controller Area Network

arXiv.org Artificial Intelligence

As electronic systems become increasingly complex and prevalent in modern vehicles, securing onboard networks is crucial, particularly as many of these systems are safety-critical. Researchers have demonstrated that modern vehicles are susceptible to various types of attacks, enabling attackers to gain control and compromise safety-critical electronic systems. Consequently, several Intrusion Detection Systems (IDSs) have been proposed in the literature to detect such cyber-attacks on vehicles. This paper introduces a novel generative classifier-based Intrusion Detection System (IDS) designed for anomaly detection in automotive networks, specifically focusing on the Controller Area Network (CAN). Leveraging variational Bayes, our proposed IDS utilizes a deep latent variable model to construct a causal graph for conditional probabilities. An auto-encoder architecture is utilized to build the classifier to estimate conditional probabilities, which contribute to the final prediction probabilities through Bayesian inference. Comparative evaluations against state-of-the-art IDSs on a public Car-hacking dataset highlight our proposed classifier's superior performance in improving detection accuracy and F1-score. The proposed IDS demonstrates its efficacy by outperforming existing models with limited training data, providing enhanced security assurance for automotive systems.


Causal Discovery on Dependent Binary Data

arXiv.org Machine Learning

The assumption of independence between observations (units) in a dataset is prevalent across various methodologies for learning causal graphical models. However, this assumption often finds itself in conflict with real-world data, posing challenges to accurate structure learning. We propose a decorrelation-based approach for causal graph learning on dependent binary data, where the local conditional distribution is defined by a latent utility model with dependent errors across units. We develop a pairwise maximum likelihood method to estimate the covariance matrix for the dependence among the units. Then, leveraging the estimated covariance matrix, we develop an EM-like iterative algorithm to generate and decorrelate samples of the latent utility variables, which serve as decorrelated data. Any standard causal discovery method can be applied on the decorrelated data to learn the underlying causal graph. We demonstrate that the proposed decorrelation approach significantly improves the accuracy in causal graph learning, through numerical experiments on both synthetic and real-world datasets.


Generalized Grade-of-Membership Estimation for High-dimensional Locally Dependent Data

arXiv.org Machine Learning

This work focuses on the mixed membership models for multivariate categorical data widely used for analyzing survey responses and population genetics data. These grade of membership (GoM) models offer rich modeling power but present significant estimation challenges for high-dimensional polytomous data. Popular existing approaches, such as Bayesian MCMC inference, are not scalable and lack theoretical guarantees in high-dimensional settings. To address this, we first observe that data from this model can be reformulated as a three-way (quasi-)tensor, with many subjects responding to many items with varying numbers of categories. We introduce a novel and simple approach that flattens the three-way quasi-tensor into a "fat" matrix, and then perform a singular value decomposition of it to estimate parameters by exploiting the singular subspace geometry. Our fast spectral method can accommodate a broad range of data distributions with arbitrarily locally dependent noise, which we formalize as the generalized-GoM models. We establish finite-sample entrywise error bounds for the generalized-GoM model parameters. This is supported by a new sharp two-to-infinity singular subspace perturbation theory for locally dependent and flexibly distributed noise, a contribution of independent interest. Simulations and applications to data in political surveys, population genetics, and single-cell sequencing demonstrate our method's superior performance.


Towards Strong AI: Transformational Beliefs and Scientific Creativity

arXiv.org Artificial Intelligence

Strong artificial intelligence (AI) is envisioned to possess general cognitive abilities and scientific creativity comparable to human intelligence, encompassing both knowledge acquisition and problem-solving. While remarkable progress has been made in weak AI, the realization of strong AI remains a topic of intense debate and critical examination. In this paper, we explore pivotal innovations in the history of astronomy and physics, focusing on the discovery of Neptune and the concept of scientific revolutions as perceived by philosophers of science. Building on these insights, we introduce a simple theoretical and statistical framework of weak beliefs, termed the Transformational Belief (TB) framework, designed as a foundation for modeling scientific creativity. Through selected illustrative examples in statistical science, we demonstrate the TB framework's potential as a promising foundation for understanding, analyzing, and even fostering creativity -- paving the way toward the development of strong AI. We conclude with reflections on future research directions and potential advancements.


Hybrid Local Causal Discovery

arXiv.org Artificial Intelligence

Local causal discovery aims to learn and distinguish the direct causes and effects of a target variable from observed data. Existing constraint-based local causal discovery methods use AND or OR rules in constructing the local causal skeleton, but using either rule alone is prone to produce cascading errors in the learned local causal skeleton, and thus impacting the inference of local causal relationships. On the other hand, directly applying score-based global causal discovery methods to local causal discovery may randomly return incorrect results due to the existence of local equivalence classes. To address the above issues, we propose a Hybrid Local Causal Discovery algorithm, called HLCD. Specifically, HLCD initially utilizes a constraint-based approach combined with the OR rule to obtain a candidate skeleton and then employs a score-based method to eliminate redundant portions in the candidate skeleton. Furthermore, during the local causal orientation phase, HLCD distinguishes between V-structures and equivalence classes by comparing the local structure scores between the two, thereby avoiding orientation interference caused by local equivalence classes. We conducted extensive experiments with seven state-of-the-art competitors on 14 benchmark Bayesian network datasets, and the experimental results demonstrate that HLCD significantly outperforms existing local causal discovery algorithms.


Learning to Forget: Bayesian Time Series Forecasting using Recurrent Sparse Spectrum Signature Gaussian Processes

arXiv.org Machine Learning

The signature kernel is a kernel between time series of arbitrary length and comes with strong theoretical guarantees from stochastic analysis. It has found applications in machine learning such as covariance functions for Gaussian processes. A strength of the underlying signature features is that they provide a structured global description of a time series. However, this property can quickly become a curse when local information is essential and forgetting is required; so far this has only been addressed with ad-hoc methods such as slicing the time series into subsegments. To overcome this, we propose a principled, data-driven approach by introducing a novel forgetting mechanism for signatures. This allows the model to dynamically adapt its context length to focus on more recent information. To achieve this, we revisit the recently introduced Random Fourier Signature Features, and develop Random Fourier Decayed Signature Features (RFDSF) with Gaussian processes (GPs). This results in a Bayesian time series forecasting algorithm with variational inference, that offers a scalable probabilistic algorithm that processes and transforms a time series into a joint predictive distribution over time steps in one pass using recurrence. For example, processing a sequence of length $10^4$ steps in $\approx 10^{-2}$ seconds and in $< 1\text{GB}$ of GPU memory. We demonstrate that it outperforms other GP-based alternatives and competes with state-of-the-art probabilistic time series forecasting algorithms.


Solving High-dimensional Inverse Problems Using Amortized Likelihood-free Inference with Noisy and Incomplete Data

arXiv.org Artificial Intelligence

We present a likelihood-free probabilistic inversion method based on normalizing flows for high-dimensional inverse problems. The proposed method is composed of two complementary networks: a summary network for data compression and an inference network for parameter estimation. The summary network encodes raw observations into a fixed-size vector of summary features, while the inference network generates samples of the approximate posterior distribution of the model parameters based on these summary features. The posterior samples are produced in a deep generative fashion by sampling from a latent Gaussian distribution and passing these samples through an invertible transformation. We construct this invertible transformation by sequentially alternating conditional invertible neural network and conditional neural spline flow layers. The summary and inference networks are trained simultaneously. We apply the proposed method to an inversion problem in groundwater hydrology to estimate the posterior distribution of the log-conductivity field conditioned on spatially sparse time-series observations of the system's hydraulic head responses.The conductivity field is represented with 706 degrees of freedom in the considered problem.The comparison with the likelihood-based iterative ensemble smoother PEST-IES method demonstrates that the proposed method accurately estimates the parameter posterior distribution and the observations' predictive posterior distribution at a fraction of the inference time of PEST-IES.


Asymptotically Optimal Search for a Change Point Anomaly under a Composite Hypothesis Model

arXiv.org Machine Learning

We address the problem of searching for a change point in an anomalous process among a finite set of M processes. Specifically, we address a composite hypothesis model in which each process generates measurements following a common distribution with an unknown parameter (vector). This parameter belongs to either a normal or abnormal space depending on the current state of the process. Before the change point, all processes, including the anomalous one, are in a normal state; after the change point, the anomalous process transitions to an abnormal state. Our goal is to design a sequential search strategy that minimizes the Bayes risk by balancing sample complexity and detection accuracy. We propose a deterministic search algorithm with the following notable properties. First, we analytically demonstrate that when the distributions of both normal and abnormal processes are unknown, the algorithm is asymptotically optimal in minimizing the Bayes risk as the error probability approaches zero. In the second setting, where the parameter under the null hypothesis is known, the algorithm achieves asymptotic optimality with improved detection time based on the true normal state. Simulation results are presented to validate the theoretical findings.