Bayesian Learning
When Additive Noise Meets Unobserved Mediators: Bivariate Denoising Diffusion for Causal Discovery
Meier, Dominik, Hiremath, Sujai, Ghosal, Promit, Gan, Kyra
Distinguishing cause and effect from bivariate observational data is a foundational problem in many disciplines, but challenging without additional assumptions. Additive noise models (ANMs) are widely used to enable sample-efficient bivariate causal discovery. However, conventional ANM-based methods fail when unobserved mediators corrupt the causal relationship between variables. This paper makes three key contributions: first, we rigorously characterize why standard ANM approaches break down in the presence of unmeasured mediators. Second, we demonstrate that prior solutions for hidden mediation are brittle in finite sample settings, limiting their practical utility. To address these gaps, we propose Bivariate Denoising Diffusion (BiDD) for causal discovery, a method designed to handle latent noise introduced by unmeasured mediators. Unlike prior methods that infer directionality through mean squared error loss comparisons, our approach introduces a novel independence test statistic: during the noising and denoising processes for each variable, we condition on the other variable as input and evaluate the independence of the predicted noise relative to this input. We prove asymptotic consistency of BiDD under the ANM, and conjecture that it performs well under hidden mediation. Experiments on synthetic and real-world data demonstrate consistent performance, outperforming existing methods in mediator-corrupted settings while maintaining strong performance in mediator-free settings.
Data Can Speak for Itself: Quality-guided Utilization of Wireless Synthetic Data
Gong, Chen, Liang, Bo, Gao, Wei, Xu, Chenren
Generative models have gained significant attention for their ability to produce realistic synthetic data that supplements the quantity of real-world datasets. While recent studies show performance improvements in wireless sensing tasks by incorporating all synthetic data into training sets, the quality of synthetic data remains unpredictable and the resulting performance gains are not guaranteed. To address this gap, we propose tractable and generalizable metrics to quantify quality attributes of synthetic data - affinity and diversity. Our assessment reveals prevalent affinity limitation in current wireless synthetic data, leading to mislabeled data and degraded task performance. We attribute the quality limitation to generative models' lack of awareness of untrained conditions and domain-specific processing. To mitigate these issues, we introduce SynCheck, a quality-guided synthetic data utilization scheme that refines synthetic data quality during task model training. Our evaluation demonstrates that SynCheck consistently outperforms quality-oblivious utilization of synthetic data, and achieves 4.3% performance improvement even when the previous utilization degrades performance by 13.4%.
Scalable Structure Learning of Bayesian Networks by Learning Algorithm Ensembles
Liu, Shengcai, Ou-yang, Hui, Wang, Zhiyuan, Chen, Cheng, Cai, Qijun, Ong, Yew-Soon, Tang, Ke
--Learning the structure of Bayesian networks (BNs) from data is challenging, especially for datasets involving a large number of variables. The recently proposed divide-and-conquer (D&D) strategies present a promising approach for learning large BNs. However, they still face a main issue of unstable learning accuracy across subproblems. In this work, we introduce the idea of employing structure learning ensemble (SLE), which combines multiple BN structure learning algorithms, to consistently achieve high learning accuracy. We further propose an automatic approach called Auto-SLE for learning near-optimal SLEs, addressing the challenge of manually designing high-quality SLEs. The learned SLE is then integrated into a D&D method. Extensive experiments firmly show the superiority of our method over D&D methods with single BN structure learning algorithm in learning large BNs, achieving accuracy improvement usually by 30% 225% on datasets involving 10,000 variables. These results indicate the significant potential of employing (automatic learning of) SLEs for scalable BN structure learning. Learning the structure of Bayesian networks (BNs) [1] from data has attracted much research interest, due to its wide applications in machine learning, statistical modeling, and causal inference [2]-[4].
Overcoming Dimensional Factorization Limits in Discrete Diffusion Models through Quantum Joint Distribution Learning
Chen, Chuangtao, Zhao, Qinglin, Zhou, MengChu, Niyato, Dusit, He, Zhimin, Situ, Haozhen
Discrete diffusion models represent a significant advance in generative modeling, demonstrating remarkable success in synthesizing complex, high-quality discrete data. However, to avoid exponential computational costs, they typically rely on calculating per-dimension transition probabilities when learning high-dimensional distributions. In this study, we rigorously prove that this approach leads to a worst-case linear scaling of Kullback-Leibler (KL) divergence with data dimension. To address this, we propose a Quantum Discrete Denoising Diffusion Probabilistic Model (QD3PM), which enables joint probability learning through diffusion and denoising in exponentially large Hilbert spaces, offering a theoretical pathway to faithfully capture the true joint distribution. By deriving posterior states through quantum Bayes' theorem, similar to the crucial role of posterior probabilities in classical diffusion models, and by learning the joint probability, we establish a solid theoretical foundation for quantum-enhanced diffusion models. For denoising, we design a quantum circuit that utilizes temporal information for parameter sharing and incorporates learnable classical-data-controlled rotations for encoding. Exploiting joint distribution learning, our approach enables single-step sampling from pure noise, eliminating iterative requirements of existing models. Simulations demonstrate the proposed model's superior accuracy in modeling complex distributions compared to factorization methods. Hence, this paper establishes a new theoretical paradigm in generative models by leveraging the quantum advantage in joint distribution learning.
GKNet: Graph Kalman Filtering and Model Inference via Model-based Deep Learning
Sabbaqi, Mohammad, Taormina, Riccardo, Isufi, Elvin
Inference tasks with time series over graphs are of importance in applications such as urban water networks, economics, and networked neuroscience. Addressing these tasks typically relies on identifying a computationally affordable model that jointly captures the graph-temporal patterns of the data. In this work, we propose a graph-aware state space model for graph time series, where both the latent state and the observation equation are parametric graph-induced models with a limited number of parameters that need to be learned. More specifically, we consider the state equation to follow a stochastic partial differential equation driven by noise over the graphs edges accounting not only for potential edge uncertainties but also for increasing the degrees of freedom in the latter in a tractable manner. The graph structure conditioning of the noise dispersion allows the state variable to deviate from the stochastic process in certain neighborhoods. The observation model is a sampled and graph-filtered version of the state capturing multi-hop neighboring influence. The goal is to learn the parameters in both state and observation models from the partially observed data for downstream tasks such as prediction and imputation. The model is inferred first through a maximum likelihood approach that provides theoretical tractability but is limited in expressivity and scalability. To improve on the latter, we use the state-space formulation to build a principled deep learning architecture that jointly learns the parameters and tracks the state in an end-to-end manner in the spirit of Kalman neural networks.
Interactive Multi-Objective Probabilistic Preference Learning with Soft and Hard Bounds
Chen, Edward, Truong, Sang T., Dullerud, Natalie, Koyejo, Sanmi, Guestrin, Carlos
High-stakes decision-making involves navigating multiple competing objectives with expensive evaluations. For instance, in brachytherapy, clinicians must balance maximizing tumor coverage (e.g., an aspirational target or soft bound of >95% coverage) against strict organ dose limits (e.g., a non-negotiable hard bound of <601 cGy to the bladder), with each plan evaluation being resource-intensive. Selecting Pareto-optimal solutions that match implicit preferences is challenging, as exhaustive Pareto frontier exploration is computationally and cognitively prohibitive, necessitating interactive frameworks to guide users. While decision-makers (DMs) often possess domain knowledge to narrow the search via such soft-hard bounds, current methods often lack systematic approaches to iteratively refine these multi-faceted preference structures. Critically, DMs must trust their final decision, confident they haven't missed superior alternatives; this trust is paramount in high-consequence scenarios. We present Active-MoSH, an interactive local-global framework designed for this process. Its local component integrates soft-hard bounds with probabilistic preference learning, maintaining distributions over DM preferences and bounds for adaptive Pareto subset refinement. This is guided by an active sampling strategy optimizing exploration-exploitation while minimizing cognitive burden. To build DM trust, Active-MoSH's global component, T-MoSH, leverages multi-objective sensitivity analysis to identify potentially overlooked, high-value points beyond immediate feedback. We demonstrate Active-MoSH's performance benefits through diverse synthetic and real-world applications. A user study on AI-generated image selection further validates our hypotheses regarding the framework's ability to improve convergence, enhance DM trust, and provide expressive preference articulation, enabling more effective DMs.
Less Greedy Equivalence Search
Ejaz, Adiba, Bareinboim, Elias
Greedy Equivalence Search (GES) is a classic score-based algorithm for causal discovery from observational data. In the sample limit, it recovers the Markov equivalence class of graphs that describe the data. Still, it faces two challenges in practice: computational cost and finite-sample accuracy. In this paper, we develop Less Greedy Equivalence Search (LGES), a variant of GES that retains its theoretical guarantees while partially addressing these limitations. LGES modifies the greedy step: rather than always applying the highest-scoring insertion, it avoids edge insertions between variables for which the score implies some conditional independence. This more targeted search yields up to a \(10\)-fold speed-up and a substantial reduction in structural error relative to GES. Moreover, LGES can guide the search using prior assumptions, while correcting these assumptions when contradicted by the data. Finally, LGES can exploit interventional data to refine the learned observational equivalence class. We prove that LGES recovers the true equivalence class in the sample limit from observational and interventional data, even with misspecified prior assumptions. Experiments demonstrate that LGES outperforms GES and other baselines in speed, accuracy, and robustness to misspecified assumptions. Our code is available at https://github.com/CausalAILab/lges.
Using Large Language Models to Suggest Informative Prior Distributions in Bayesian Statistics
Riegler, Michael A., Hellton, Kristoffer Herland, Thambawita, Vajira, Hammer, Hugo L.
Selecting prior distributions in Bayesian statistics is challenging, resource-intensive, and subjective. We analyze using large-language models (LLMs) to suggest suitable, knowledge-based informative priors. We developed an extensive prompt asking LLMs not only to suggest priors but also to verify and reflect on their choices. We evaluated Claude Opus, Gemini 2.5 Pro, and ChatGPT-4o-mini on two real datasets: heart disease risk and concrete strength. All LLMs correctly identified the direction for all associations (e.g., that heart disease risk is higher for males). The quality of suggested priors was measured by their Kullback-Leibler divergence from the maximum likelihood estimator's distribution. The LLMs suggested both moderately and weakly informative priors. The moderate priors were often overconfident, resulting in distributions misaligned with the data. In our experiments, Claude and Gemini provided better priors than ChatGPT. For weakly informative priors, a key performance difference emerged: ChatGPT and Gemini defaulted to an "unnecessarily vague" mean of 0, while Claude did not, demonstrating a significant advantage. The ability of LLMs to identify correct associations shows their great potential as an efficient, objective method for developing informative priors. However, the primary challenge remains in calibrating the width of these priors to avoid over- and under-confidence.
Epistemic Artificial Intelligence is Essential for Machine Learning Models to Truly 'Know When They Do Not Know'
Manchingal, Shireen Kudukkil, Bradley, Andrew, Kooij, Julian F. P., Shariatmadar, Keivan, Yorke-Smith, Neil, Cuzzolin, Fabio
Despite AI's impressive achievements, including recent advances in generative and large language models, there remains a significant gap in the ability of AI systems to handle uncertainty and generalize beyond their training data. AI models consistently fail to make robust enough predictions when facing unfamiliar or adversarial data. Traditional machine learning approaches struggle to address this issue, due to an overemphasis on data fitting, while current uncertainty quantification approaches suffer from serious limitations. This position paper posits a paradigm shift towards epistemic artificial intelligence, emphasizing the need for models to learn from what they know while at the same time acknowledging their ignorance, using the mathematics of second-order uncertainty measures. This approach, which leverages the expressive power of such measures to efficiently manage uncertainty, offers an effective way to improve the resilience and robustness of AI systems, allowing them to better handle unpredictable real-world environments.
Offensive Language Detection on Social Media Using XLNet
Alothman, Reem, Benhidour, Hafida, Kerrache, Said
The widespread use of text-based communication on social media-through chats, comments, and microblogs-has improved user interaction but has also led to an increase in offensive content, including hate speech, racism, and other forms of abuse. Due to the enormous volume of user-generated content, manual moderation is impractical, which creates a need for automated systems that can detect offensive language. Deep learning models, particularly those using transfer learning, have demonstrated significant success in understanding natural language through large-scale pretraining. In this study, we propose an automatic offensive language detection model based on XLNet, a generalized autoregressive pretraining method, and compare its performance with BERT (Bidirectional Encoder Representations from Transformers), which is a widely used baseline in natural language processing (NLP). Both models are evaluated using the Offensive Language Identification Dataset (OLID), a benchmark Twitter dataset that includes hierarchical annotations. Our experimental results show that XLNet outperforms BERT in detecting offensive content and in categorizing the types of offenses, while BERT performs slightly better in identifying the targets of the offenses. Additionally, we find that oversampling and undersampling strategies are effective in addressing class imbalance and improving classification performance. These findings highlight the potential of transfer learning and XLNet-based architectures to create robust systems for detecting offensive language on social media platforms.