Goto

Collaborating Authors

 Bayesian Learning


Overcoming error-in-variable problem in data-driven model discovery by orthogonal distance regression

arXiv.org Machine Learning

Despite the recent proliferation of machine learning methods like SINDy that promise automatic discovery of governing equations from time-series data, there remain significant challenges to discovering models from noisy datasets. One reason is that the linear regression underlying these methods assumes that all noise resides in the training target (the regressand), which is the time derivative, whereas the measurement noise is in the states (the regressors). Recent methods like modified-SINDy and DySMHO address this error-in-variable problem by leveraging information from the model's temporal evolution, but they are also imposing the equation as a hard constraint, which effectively assumes no error in the regressand. Without relaxation, this hard constraint prevents assimilation of data longer than Lyapunov time. Instead, the fulfilment of the model equation should be treated as a soft constraint to account for the small yet critical error introduced by numerical truncation. The uncertainties in both the regressor and the regressand invite the use of orthogonal distance regression (ODR). By incorporating ODR with the Bayesian framework for model selection, we introduce a novel method for model discovery, termed ODR-BINDy, and assess its performance against current SINDy variants using the Lorenz63, Rossler, and Van Der Pol systems as case studies. Our findings indicate that ODR-BINDy consistently outperforms all existing methods in recovering the correct model from sparse and noisy datasets. For instance, our ODR-BINDy method reliably recovers the Lorenz63 equation from data with noise contamination levels of up to 30%.


Spatiodynamic inference using vision-based generative modelling

arXiv.org Machine Learning

Biological systems commonly exhibit complex spatiotemporal patterns whose underlying generative mechanisms pose a significant analytical challenge. Traditional approaches to spatiodynamic inference rely on dimensionality reduction through summary statistics, which sacrifice complexity and interdependent structure intrinsic to these data in favor of parameter identifiability. This imposes a fundamental constraint on reliably extracting mechanistic insights from spatiotemporal data, highlighting the need for analytical frameworks that preserve the full richness of these dynamical systems. To address this, we developed a simulation-based inference framework that employs vision transformer-driven variational encoding to generate compact representations of the data, exploiting the inherent contextual dependencies. These representations are subsequently integrated into a likelihood-free Bayesian approach for parameter inference. The central idea is to construct a fine-grained, structured mesh of latent representations from simulated dynamics through systematic exploration of the parameter space. This encoded mesh of latent embeddings then serves as a reference map for retrieving parameter values that correspond to observed data. By integrating generative modeling with Bayesian principles, our approach provides a unified inference framework to identify both spatial and temporal patterns that manifest in multivariate dynamical systems.


Incorporating structural uncertainty in causal decision making

arXiv.org Artificial Intelligence

Practitioners making decisions based on causal effects typically ignore structural uncertainty. We analyze when this uncertainty is consequential enough to warrant methodological solutions (Bayesian model averaging over competing causal structures). Focusing on bivariate relationships ($X \rightarrow Y$ vs. $X \leftarrow Y$), we establish that model averaging is beneficial when: (1) structural uncertainty is moderate to high, (2) causal effects differ substantially between structures, and (3) loss functions are sufficiently sensitive to the size of the causal effect. We prove optimality results of our suggested methodological solution under regularity conditions and demonstrate through simulations that modern causal discovery methods can provide, within limits, the necessary quantification. Our framework complements existing robust causal inference approaches by addressing a distinct source of uncertainty typically overlooked in practice.


Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models

arXiv.org Artificial Intelligence

Large language models (LLMs) have shown strong performance across natural language reasoning tasks, yet their reasoning processes remain brittle and difficult to interpret. Prompting techniques like Chain-of-Thought (CoT) enhance reliability by eliciting intermediate reasoning steps or aggregating multiple outputs. However, they lack mechanisms for enforcing logical structure and assessing internal coherence. We introduce Theorem-of-Thought (ToTh), a novel framework that models reasoning as collaboration among three parallel agents, each simulating a distinct mode of inference: abductive, deductive, and inductive. Each agent produces a reasoning trace, which is structured into a formal reasoning graph. To evaluate consistency, we apply Bayesian belief propagation guided by natural language inference (NLI), assigning confidence scores to each step. The most coherent graph is selected to derive the final answer. Experiments on symbolic (WebOfLies) and numerical (MultiArith) reasoning benchmarks show that ToTh consistently outperforms CoT, Self-Consistency, and CoT-Decoding across multiple LLMs, while producing interpretable and logically grounded reasoning chains. Our findings suggest a promising direction for building more robust and cognitively inspired LLM reasoning. The implementation is available at https://github.com/KurbanIntelligenceLab/theorem-of-thought.


Consensus-Driven Active Model Selection

arXiv.org Artificial Intelligence

The widespread availability of off-the-shelf machine learning models poses a challenge: which model, of the many available candidates, should be chosen for a given data analysis task? This question of model selection is traditionally answered by collecting and annotating a validation dataset -- a costly and time-intensive process. We propose a method for active model selection, using predictions from candidate models to prioritize the labeling of test data points that efficiently differentiate the best candidate. Our method, CODA, performs consensus-driven active model selection by modeling relationships between classifiers, categories, and data points within a probabilistic framework. The framework uses the consensus and disagreement between models in the candidate pool to guide the label acquisition process, and Bayesian inference to update beliefs about which model is best as more information is collected. We validate our approach by curating a collection of 26 benchmark tasks capturing a range of model selection scenarios. CODA outperforms existing methods for active model selection significantly, reducing the annotation effort required to discover the best model by upwards of 70% compared to the previous state-of-the-art. Code and data are available at https://github.com/justinkay/coda.


LVM-GP: Uncertainty-Aware PDE Solver via coupling latent variable model and Gaussian process

arXiv.org Machine Learning

We propose a novel probabilistic framework, termed LVM-GP, for uncertainty quantification in solving forward and inverse partial differential equations (PDEs) with noisy data. The core idea is to construct a stochastic mapping from the input to a high-dimensional latent representation, enabling uncertainty-aware prediction of the solution. Specifically, the architecture consists of a confidence-aware encoder and a probabilistic decoder. The encoder implements a high-dimensional latent variable model based on a Gaussian process (LVM-GP), where the latent representation is constructed by interpolating between a learnable deterministic feature and a Gaussian process prior, with the interpolation strength adaptively controlled by a confidence function learned from data. The decoder defines a conditional Gaussian distribution over the solution field, where the mean is predicted by a neural operator applied to the latent representation, allowing the model to learn flexible function-to-function mapping. Moreover, physical laws are enforced as soft constraints in the loss function to ensure consistency with the underlying PDE structure. Compared to existing approaches such as Bayesian physics-informed neural networks (B-PINNs) and deep ensembles, the proposed framework can efficiently capture functional dependencies via merging a latent Gaussian process and neural operator, resulting in competitive predictive accuracy and robust uncertainty quantification. Numerical experiments demonstrate the effectiveness and reliability of the method.


Simulating Posterior Bayesian Neural Networks with Dependent Weights

arXiv.org Machine Learning

The theoretical study of Bayesian neural networks was initiated by Neal [29] who proved that if a shallow Bayesian neural network is initialized with independent Gaussian parameters (i.e., biases and weights), then the output of the network converges in distribution to a Gaussian process, as the number of neurons grows large ( i.e., in the wide width limit). This result was extended to Bayesian deep neural networks two decades later (see [16, 22, 26]) and only recently it has been made quantitative by the use of the optimal transport theory (see [6] and [33]), by the Stein method for Gaussian approximation (see [3, 4, 8, 13]), and by alternative techniques ([7, 11]). Another promising approach to analyze Bayesian neural networks is through the lens of large deviations. First results in this direction are given in [23]. These findings have been successively generalized in [2, 34]. A different perspective is provided by the so-called mean field analysis of networks (see [27, 15]). The advantage of the Bayesian framework is that it allows to include in the model both prior knowledge and observed data through a prior distribution on network's parameters and a likelihood function, respectively. The emergence of Gaussian processes helped to understand how large neural networks work, how to make them more efficient, and motivated the use of Bayesian regression inference methods, see [22]. However, as noticed by [28] and [21], the connection with Gaussian processes also highlighted the limitations of wide width neural networks with independent and Gaussian distributed weights.


Transductive Model Selection under Prior Probability Shift

arXiv.org Artificial Intelligence

Transductive learning is a supervised machine learning task in which, unlike in traditional inductive learning, the unlabelled data that require labelling are a finite set and are available at training time. Similarly to inductive learning contexts, transductive learning contexts may be affected by dataset shift, i.e., may be such that the IID assumption does not hold. We here propose a method, tailored to transductive classification contexts, for performing model selection (i.e., hyperparameter optimisation) when the data exhibit prior probability shift, an important type of dataset shift typical of anti-causal learning problems. In our proposed method the hyperparameters can be optimised directly on the unlabelled data to which the trained classifier must be applied; this is unlike traditional model selection methods, that are based on performing cross-validation on the labelled training data. We provide experimental results that show the benefits brought about by our method.


Accident-Driven Congestion Prediction and Simulation: An Explainable Framework Using Advanced Clustering and Bayesian Networks

arXiv.org Artificial Intelligence

Traffic congestion due to uncertainties, such as accidents, is a significant issue in urban areas, as the ripple effect of accidents causes longer delays, increased emissions, and safety concerns. To address this issue, we propose a robust framework for predicting the impact of accidents on congestion. We implement Automated Machine Learning (AutoML)-enhanced Deep Embedding Clustering (DEC) to assign congestion labels to accident data and predict congestion probability using a Bayesian Network (BN). The Simulation of Urban Mobility (SUMO) simulation is utilized to evaluate the correctness of BN predictions using evidence-based scenarios. Results demonstrate that the AutoML-enhanced DEC has outperformed traditional clustering approaches. The performance of the proposed BN model achieved an overall accuracy of 95.6%, indicating its ability to understand the complex relationship of accidents causing congestion. Validation in SUMO with evidence-based scenarios demonstrated that the BN model's prediction of congestion states closely matches those of SUMO, indicating the high reliability of the proposed BN model in ensuring smooth urban mobility.


Proto-EVFL: Enhanced Vertical Federated Learning via Dual Prototype with Extremely Unaligned Data

arXiv.org Artificial Intelligence

--In vertical federated learning (VFL), multiple enterprises address aligned sample scarcity by leveraging massive locally unaligned samples to facilitate collaborative learning. However, unaligned samples across different parties in VFL can be extremely class-imbalanced, leading to insufficient feature representation and limited model prediction space. Specifically, class-imbalanced problems consist of intra-party class imbalance and inter-party class imbalance, which can further cause local model bias and feature contribution inconsistency issues, respectively. T o address the above challenges, we propose Proto-EVFL, an enhanced VFL framework via dual prototypes. We first introduce class prototypes for each party to learn relationships between classes in the latent space, allowing the active party to predict unseen classes. We further design a probabilistic dual prototype learning scheme to dynamically select unaligned samples by conditional optimal transport cost with class prior probability. Moreover, a mixed prior guided module guides this selection process by combining local and global class prior probabilities. Finally, we adopt an adaptive gated feature aggregation strategy to mitigate feature contribution inconsistency by dynamically weighting and aggregating local features across different parties. We proved that Proto-EVFL, as the first bi-level optimization framework in VFL, has a convergence rate of 1 / T . Even in a zero-shot scenario with one unseen class, it outperforms baselines by at least 6.97%. NTRODUCTION indicates equal contribution, * represents the corresponding authors Wei Guo, Yiyang Duan and Fuzhen Zhuang are with the School of Artificial Intelligence, Beihang University, Beijing 100083, China (e-mail: { guowei, duanyiyang, zhuangfuzhen }@buaa.edu.cn). Xiao Zhang is with the School of Computer Science and Technology, Shan-dong University, Shandong 266237, China (e-mail: xiaozhang@sdu.edu.cn). Zhaojun Hu is with the Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing 100872, China (e-mail: huzhao-jun@ruc.edu.cn).