Bayesian Learning
A New Stochastic Approximation Method for Gradient-based Simulated Parameter Estimation
This paper tackles the challenge of parameter calibration in stochastic models, particularly in scenarios where the likelihood function is unavailable in an analytical form. We introduce a gradient-based simulated parameter estimation framework, which employs a multi-time scale stochastic approximation algorithm. This approach effectively addresses the ratio bias that arises in both maximum likelihood estimation and posterior density estimation problems. The proposed algorithm enhances estimation accuracy and significantly reduces computational costs, as demonstrated through extensive numerical experiments. Our work extends the GSPE framework to handle complex models such as hidden Markov models and variational inference-based problems, offering a robust solution for parameter estimation in challenging stochastic environments.
Efficient Transformed Gaussian Process State-Space Models for Non-Stationary High-Dimensional Dynamical Systems
Lin, Zhidi, Li, Ying, Yin, Feng, Maroรฑas, Juan, Thiรฉry, Alexandre H.
Gaussian process state-space models (GPSSMs) have emerged as a powerful framework for modeling dynamical systems, offering interpretable uncertainty quantification and inherent regularization. However, existing GPSSMs face significant challenges in handling high-dimensional, non-stationary systems due to computational inefficiencies, limited scalability, and restrictive stationarity assumptions. In this paper, we propose an efficient transformed Gaussian process state-space model (ETGPSSM) to address these limitations. Our approach leverages a single shared Gaussian process (GP) combined with normalizing flows and Bayesian neural networks, enabling efficient modeling of complex, high-dimensional state transitions while preserving scalability. To address the lack of closed-form expressions for the implicit process in the transformed GP, we follow its generative process and introduce an efficient variational inference algorithm, aided by the ensemble Kalman filter (EnKF), to enable computationally tractable learning and inference. Extensive empirical evaluations on synthetic and real-world datasets demonstrate the superior performance of our ETGPSSM in system dynamics learning, high-dimensional state estimation, and time-series forecasting, outperforming existing GPSSMs and neural network-based methods in both accuracy and computational efficiency.
Graphical Transformation Models
Herp, Matthias, Brachem, Johannes, Altenbuchinger, Michael, Kneib, Thomas
Graphical Transformation Models (GTMs) are introduced as a novel approach to effectively model multivariate data with intricate marginals and complex dependency structures non-parametrically, while maintaining interpretability through the identification of varying conditional independencies. GTMs extend multivariate transformation models by replacing the Gaussian copula with a custom-designed multivariate transformation, offering two major advantages. Firstly, GTMs can capture more complex interdependencies using penalized splines, which also provide an efficient regularization scheme. Secondly, we demonstrate how to approximately regularize GTMs using a lasso penalty towards pairwise conditional independencies, akin to Gaussian graphical models. The model's robustness and effectiveness are validated through simulations, showcasing its ability to accurately learn parametric vine copulas and identify conditional independencies. Additionally, the model is applied to a benchmark astrophysics dataset, where the GTM demonstrates favorable performance compared to non-parametric vine copulas in learning complex multivariate distributions.
CardioTabNet: A Novel Hybrid Transformer Model for Heart Disease Prediction using Tabular Medical Data
Sumon, Md. Shaheenur Islam, Islam, Md. Sakib Bin, Rahman, Md. Sohanur, Hossain, Md. Sakib Abrar, Khandakar, Amith, Hasan, Anwarul, Murugappan, M, Chowdhury, Muhammad E. H.
The early detection and prediction of cardiovascular diseases are crucial for reducing the severe morbidity and mortality associated with these conditions worldwide. A multi-headed self-attention mechanism, widely used in natural language processing (NLP), is operated by Transformers to understand feature interactions in feature spaces. However, the relationships between various features within biological systems remain ambiguous in these spaces, highlighting the necessity of early detection and prediction of cardiovascular diseases to reduce the severe morbidity and mortality with these conditions worldwide. We handle this issue with CardioTabNet, which exploits the strength of tab transformer to extract feature space which carries strong understanding of clinical cardiovascular data and its feature ranking. As a result, performance of downstream classical models significantly showed outstanding result. Our study utilizes the open-source dataset for heart disease prediction with 1190 instances and 11 features. In total, 11 features are divided into numerical (age, resting blood pressure, cholesterol, maximum heart rate, old peak, weight, and fasting blood sugar) and categorical (resting ECG, exercise angina, and ST slope). Tab transformer was used to extract important features and ranked them using random forest (RF) feature ranking algorithm. Ten machine-learning models were used to predict heart disease using selected features. After extracting high-quality features, the top downstream model (a hyper-tuned ExtraTree classifier) achieved an average accuracy rate of 94.1% and an average Area Under Curve (AUC) of 95.0%. Furthermore, a nomogram analysis was conducted to evaluate the model's effectiveness in cardiovascular risk assessment. A benchmarking study was conducted using state-of-the-art models to evaluate our transformer-driven framework.
Principal Component Analysis When n < p: Challenges and Solutions
Weeraratne, Nuwan, Hunt, Lyn, Kurz, Jason
Principal Component Analysis is a key technique for reducing the complexity of high-dimensional data while preserving its fundamental data structure, ensuring models remain stable and interpretable. This is achieved by transforming the original variables into a new set of uncorrelated variables (principal components) based on the covariance structure of the original variables. However, since the traditional maximum likelihood covariance estimator does not accurately converge to the true covariance matrix, the standard principal component analysis performs poorly as a dimensionality reduction technique in high-dimensional scenarios $n
Bayesian Teaching Enables Probabilistic Reasoning in Large Language Models
Qiu, Linlu, Sha, Fei, Allen, Kelsey, Kim, Yoon, Linzen, Tal, van Steenkiste, Sjoerd
Artificial intelligence systems based on large language models (LLMs) are increasingly used as agents that interact with users and with the world. To do so successfully, LLMs need to construct internal representations of the world and form probabilistic beliefs about those representations. To provide a user with personalized recommendations, for example, the LLM needs to gradually infer the user's preferences, over the course of multiple interactions. To evaluate whether contemporary LLMs are able to do so, we use the Bayesian inference framework from probability theory, which lays out the optimal way to update an agent's beliefs as it receives new information. We first show that the LLMs do not update their beliefs as expected from the Bayesian framework, and that consequently their predictions do not improve as expected as more information becomes available, even less so than we find is the case for humans. To address this issue, we teach the LLMs to reason in a Bayesian manner by training them to mimic the predictions of an optimal Bayesian model. We find that this approach not only significantly improves the LLM's performance on the particular recommendation task it is trained on, but also enables generalization to other tasks. This suggests that this method endows the LLM with broader Bayesian reasoning skills. More generally, our results indicate that LLMs can learn about reasoning strategies effectively and generalize those skills to new domains, which in part explains LLMs' empirical success.
Offline Model-Based Optimization: Comprehensive Review
Kim, Minsu, Gu, Jiayao, Yuan, Ye, Yun, Taeyoung, Liu, Zixuan, Bengio, Yoshua, Chen, Can
Offline optimization is a fundamental challenge in science and engineering, where the goal is to optimize black-box functions using only offline datasets. This setting is particularly relevant when querying the objective function is prohibitively expensive or infeasible, with applications spanning protein engineering, material discovery, neural architecture search, and beyond. The main difficulty lies in accurately estimating the objective landscape beyond the available data, where extrapolations are fraught with significant epistemic uncertainty. This uncertainty can lead to objective hacking(reward hacking), exploiting model inaccuracies in unseen regions, or other spurious optimizations that yield misleadingly high performance estimates outside the training distribution. Recent advances in model-based optimization(MBO) have harnessed the generalization capabilities of deep neural networks to develop offline-specific surrogate and generative models. Trained with carefully designed strategies, these models are more robust against out-of-distribution issues, facilitating the discovery of improved designs. Despite its growing impact in accelerating scientific discovery, the field lacks a comprehensive review. To bridge this gap, we present the first thorough review of offline MBO. We begin by formalizing the problem for both single-objective and multi-objective settings and by reviewing recent benchmarks and evaluation metrics. We then categorize existing approaches into two key areas: surrogate modeling, which emphasizes accurate function approximation in out-of-distribution regions, and generative modeling, which explores high-dimensional design spaces to identify high-performing designs. Finally, we examine the key challenges and propose promising directions for advancement in this rapidly evolving field including safe control of superintelligent systems.
A Guide to Bayesian Networks Software Packages for Structure and Parameter Learning -- 2025 Edition
Gaudillo, Joverlyn, Astrologo, Nicole, Stella, Fabio, Acerbi, Enzo, Canonaco, Francesco
A representation of the cause-effect mechanism is needed to enable artificial intelligence to represent how the world works. Bayesian Networks (BNs) have proven to be an effective and versatile tool for this task. BNs require constructing a structure of dependencies among variables and learning the parameters that govern these relationships. These tasks, referred to as structural learning and parameter learning, are actively investigated by the research community, with several algorithms proposed and no single method having established itself as standard. A wide range of software, tools, and packages have been developed for BNs analysis and made available to academic researchers and industry practitioners. As a consequence of having no one-size-fits-all solution, moving the first practical steps and getting oriented into this field is proving to be challenging to outsiders and beginners. In this paper, we review the most relevant tools and software for BNs structural and parameter learning to date, providing our subjective recommendations directed to an audience of beginners. In addition, we provide an extensive easy-to-consult overview table summarizing all software packages and their main features. By improving the reader understanding of which available software might best suit their needs, we improve accessibility to the field and make it easier for beginners to take their first step into it.
Fairness-Driven LLM-based Causal Discovery with Active Learning and Dynamic Scoring
Causal discovery (CD) plays a pivotal role in numerous scientific fields by clarifying the causal relationships that underlie phenomena observed in diverse disciplines. Despite significant advancements in CD algorithms that enhance bias and fairness analyses in machine learning, their application faces challenges due to the high computational demands and complexities of large-scale data. This paper introduces a framework that leverages Large Language Models (LLMs) for CD, utilizing a metadata-based approach akin to the reasoning processes of human experts. By shifting from pairwise queries to a more scalable breadth-first search (BFS) strategy, the number of required queries is reduced from quadratic to linear in terms of variable count, thereby addressing scalability concerns inherent in previous approaches. This method utilizes an Active Learning (AL) and a Dynamic Scoring Mechanism that prioritizes queries based on their potential information gain, combining mutual information, partial correlation, and LLM confidence scores to refine the causal graph more efficiently and accurately. This BFS query strategy reduces the required number of queries significantly, thereby addressing scalability concerns inherent in previous approaches. This study provides a more scalable and efficient solution for leveraging LLMs in fairness-driven CD, highlighting the effects of the different parameters on performance. We perform fairness analyses on the inferred causal graphs, identifying direct and indirect effects of sensitive attributes on outcomes. A comparison of these analyses against those from graphs produced by baseline methods highlights the importance of accurate causal graph construction in understanding bias and ensuring fairness in machine learning systems.
Machine Learning-Based Genomic Linguistic Analysis (Gene Sequence Feature Learning): A Case Study on Predicting Heavy Metal Response Genes in Rice
Yang, Ruiqi, Wang, Jianxu, Yuan, Wei, Wang, Xun, Li, Mei
This study explores the application of machine learning-based genetic linguistics for identifying heavy metal response genes in rice (Oryza sativa). By integrating convolutional neural networks and random forest algorithms, we developed a hybrid model capable of extracting and learning meaningful features from gene sequences, such as k-mer frequencies and physicochemical properties. The model was trained and tested on datasets of genes, achieving high predictive performance (precision: 0.89, F1-score: 0.82). RNA-seq and qRT-PCR experiments conducted on rice leaves which exposed to Hg0, revealed differential expression of genes associated with heavy metal responses, which validated the model's predictions. Co-expression network analysis identified 103 related genes, and a literature review indicated that these genes are highly likely to be involved in heavy metal-related biological processes. By integrating and comparing the analysis results with those of differentially expressed genes (DEGs), the validity of the new machine learning method was further demonstrated. This study highlights the efficacy of combining machine learning with genetic linguistics for large-scale gene prediction. It demonstrates a cost-effective and efficient approach for uncovering molecular mechanisms underlying heavy metal responses, with potential applications in developing stress-tolerant crop varieties.