Goto

Collaborating Authors

 Bayesian Learning


Extremely Greedy Equivalence Search

arXiv.org Machine Learning

The goal of causal discovery is to learn a directed acyclic graph from data. One of the most well-known methods for this problem is Greedy Equivalence Search (GES). GES searches for the graph by incrementally and greedily adding or removing edges to maximize a model selection criterion. It has strong theoretical guarantees on infinite data but can fail in practice on finite data. In this paper, we first identify some of the causes of GES's failure, finding that it can get blocked in local optima, especially in denser graphs. We then propose eXtremely Greedy Equivalent Search (XGES), which involves a new heuristic to improve the search strategy of GES while retaining its theoretical guarantees. In particular, XGES favors deleting edges early in the search over inserting edges, which reduces the possibility of the search ending in local optima. A further contribution of this work is an efficient algorithmic formulation of XGES (and GES). We benchmark XGES on simulated datasets with known ground truth. We find that XGES consistently outperforms GES in recovering the correct graphs, and it is 10 times faster. XGES implementations in Python and C++ are available at https://github.com/ANazaret/XGES.


AI-Powered Bayesian Inference

arXiv.org Machine Learning

The advent of Generative Artificial Intelligence (GAI) has heralded an inflection point that changed how society thinks about knowledge acquisition. While GAI cannot be fully trusted for decision-making, it may still provide valuable information that can be integrated into a decision pipeline. Rather than seeing the lack of certitude and inherent randomness of GAI as a problem, we view it as an opportunity. Indeed, variable answers to given prompts can be leveraged to construct a prior distribution which reflects assuredness of AI predictions. This prior distribution may be combined with tailored datasets for a fully Bayesian analysis with an AI-driven prior. In this paper, we explore such a possibility within a non-parametric Bayesian framework. The basic idea consists of assigning a Dirichlet process prior distribution on the data-generating distribution with AI generative model as its baseline. Hyper-parameters of the prior can be tuned out-of-sample to assess the informativeness of the AI prior. Posterior simulation is achieved by computing a suitably randomized functional on an augmented data that consists of observed (labeled) data as well as fake data whose labels have been imputed using AI. This strategy can be parallelized and rapidly produces iid samples from the posterior by optimization as opposed to sampling from conditionals. Our method enables (predictive) inference and uncertainty quantification leveraging AI predictions in a coherent probabilistic manner.


Near-Optimal Approximations for Bayesian Inference in Function Space

arXiv.org Machine Learning

We propose a scalable inference algorithm for Bayes posteriors defined on a reproducing kernel Hilbert space (RKHS). Given a likelihood function and a Gaussian random element representing the prior, the corresponding Bayes posterior measure $\Pi_{\text{B}}$ can be obtained as the stationary distribution of an RKHS-valued Langevin diffusion. We approximate the infinite-dimensional Langevin diffusion via a projection onto the first $M$ components of the Kosambi-Karhunen-Lo\`eve expansion. Exploiting the thus obtained approximate posterior for these $M$ components, we perform inference for $\Pi_{\text{B}}$ by relying on the law of total probability and a sufficiency assumption. The resulting method scales as $O(M^3+JM^2)$, where $J$ is the number of samples produced from the posterior measure $\Pi_{\text{B}}$. Interestingly, the algorithm recovers the posterior arising from the sparse variational Gaussian process (SVGP) (see Titsias, 2009) as a special case, owed to the fact that the sufficiency assumption underlies both methods. However, whereas the SVGP is parametrically constrained to be a Gaussian process, our method is based on a non-parametric variational family $\mathcal{P}(\mathbb{R}^M)$ consisting of all probability measures on $\mathbb{R}^M$. As a result, our method is provably close to the optimal $M$-dimensional variational approximation of the Bayes posterior $\Pi_{\text{B}}$ in $\mathcal{P}(\mathbb{R}^M)$ for convex and Lipschitz continuous negative log likelihoods, and coincides with SVGP for the special case of a Gaussian error likelihood.


Multi-class Seismic Building Damage Assessment from InSAR Imagery using Quadratic Variational Causal Bayesian Inference

arXiv.org Artificial Intelligence

Interferometric Synthetic Aperture Radar (InSAR) technology uses satellite radar to detect surface deformation patterns and monitor earthquake impacts on buildings. While vital for emergency response planning, extracting multi-class building damage classifications from InSAR data faces challenges: overlapping damage signatures with environmental noise, computational complexity in multi-class scenarios, and the need for rapid regional-scale processing. Our novel multi-class variational causal Bayesian inference framework with quadratic variational bounds provides rigorous approximations while ensuring efficiency. By integrating InSAR observations with USGS ground failure models and building fragility functions, our approach separates building damage signals while maintaining computational efficiency through strategic pruning. Evaluation across five major earthquakes (Haiti 2021, Puerto Rico 2020, Zagreb 2020, Italy 2016, Ridgecrest 2019) shows improved damage classification accuracy (AUC: 0.94-0.96), achieving up to 35.7% improvement over existing methods. Our approach maintains high accuracy (AUC > 0.93) across all damage categories while reducing computational overhead by over 40% without requiring extensive ground truth data.


Uncertainty-aware abstention in medical diagnosis based on medical texts

arXiv.org Artificial Intelligence

This study addresses the critical issue of reliability for AI-assisted medical diagnosis. We focus on the selection prediction approach that allows the diagnosis system to abstain from providing the decision if it is not confident in the diagnosis. Such selective prediction (or abstention) approaches are usually based on the modeling predictive uncertainty of machine learning models involved. This study explores uncertainty quantification in machine learning models for medical text analysis, addressing diverse tasks across multiple datasets. We focus on binary mortality prediction from textual data in MIMIC-III, multi-label medical code prediction using ICD-10 codes from MIMIC-IV, and multi-class classification with a private outpatient visits dataset. Additionally, we analyze mental health datasets targeting depression and anxiety detection, utilizing various text-based sources, such as essays, social media posts, and clinical descriptions. In addition to comparing uncertainty methods, we introduce HUQ-2, a new state-of-the-art method for enhancing reliability in selective prediction tasks. Our results provide a detailed comparison of uncertainty quantification methods. They demonstrate the effectiveness of HUQ-2 in capturing and evaluating uncertainty, paving the way for more reliable and interpretable applications in medical text analysis.


From Small to Large Language Models: Revisiting the Federalist Papers

arXiv.org Machine Learning

For a long time, the authorship of the Federalist Papers had been a subject of inquiry and debate, not only by linguists and historians but also by statisticians. In what was arguably the first Bayesian case study, Mosteller and Wallace (1963) provided the first statistical evidence for attributing all disputed papers to Madison. Our paper revisits this historical dataset but from a lens of modern language models, both small and large. We review some of the more popular Large Language Model (LLM) tools and examine them from a statistical point of view in the context of text classification. We investigate whether, without any attempt to fine-tune, the general embedding constructs can be useful for stylometry and attribution. We explain differences between various word/phrase embeddings and discuss how to aggregate them in a document. Contrary to our expectations, we exemplify that dimension expansion with word embeddings may not always be beneficial for attribution relative to dimension reduction with topic embeddings. Our experiments demonstrate that default LLM embeddings (even after manual fine-tuning) may not consistently improve authorship attribution accuracy. Instead, Bayesian analysis with topic embeddings trained on ``function words" yields superior out-of-sample classification performance. This suggests that traditional (small) statistical language models, with their interpretability and solid theoretical foundation, can offer significant advantages in authorship attribution tasks. The code used in this analysis is available at github.com/sowonjeong/slm-to-llm


Learning sparse generalized linear models with binary outcomes via iterative hard thresholding

arXiv.org Machine Learning

In statistics, generalized linear models (GLMs) are widely used for modeling data and can expressively capture potential nonlinear dependence of the model's outcomes on its covariates. Within the broad family of GLMs, those with binary outcomes, which include logistic and probit regressions, are motivated by common tasks such as binary classification with (possibly) non-separable data. In addition, in modern machine learning and statistics, data is often high-dimensional yet has a low intrinsic dimension, making sparsity constraints in models another reasonable consideration. In this work, we propose to use and analyze an iterative hard thresholding (projected gradient descent on the ReLU loss) algorithm, called binary iterative hard thresholding (BIHT), for parameter estimation in sparse GLMs with binary outcomes. We establish that BIHT is statistically efficient and converges to the correct solution for parameter estimation in a general class of sparse binary GLMs. Unlike many other methods for learning GLMs, including maximum likelihood estimation, generalized approximate message passing, and GLM-tron (Kakade et al. 2011; Bahmani et al. 2016), BIHT does not require knowledge of the GLM's link function, offering flexibility and generality in allowing the algorithm to learn arbitrary binary GLMs. As two applications, logistic and probit regression are additionally studied. In this regard, it is shown that in logistic regression, the algorithm is in fact statistically optimal in the sense that the order-wise sample complexity matches (up to logarithmic factors) the lower bound obtained previously. To the best of our knowledge, this is the first work achieving statistical optimality for logistic regression in all noise regimes with a computationally efficient algorithm. Moreover, for probit regression, our sample complexity is on the same order as that obtained for logistic regression.


A comparative analysis of rank aggregation methods for the partial label ranking problem

arXiv.org Machine Learning

The label ranking problem is a supervised learning scenario in which the learner predicts a total order of the class labels for a given input instance. Recently, research has increasingly focused on the partial label ranking problem, a generalization of the label ranking problem that allows ties in the predicted orders. So far, most existing learning approaches for the partial label ranking problem rely on approximation algorithms for rank aggregation in the final prediction step. This paper explores several alternative aggregation methods for this critical step, including scoring-based and probabilistic-based rank aggregation approaches. To enhance their suitability for the more general partial label ranking problem, the investigated methods are extended to increase the likelihood of producing ties. Experimental evaluations on standard benchmarks demonstrate that scoring-based variants consistently outperform the current state-of-the-art method in handling incomplete information. In contrast, probabilistic-based variants fail to achieve competitive performance.


Unraveling particle dark matter with Physics-Informed Neural Networks

arXiv.org Artificial Intelligence

We parametrically solve the Boltzmann equations governing freeze-in dark matter (DM) in alternative cosmologies with Physics-Informed Neural Networks (PINNs), a mesh-free method. Through inverse PINNs, using a single DM experimental point -- observed relic density -- we determine the physical attributes of the theory, namely power-law cosmologies, inspired by braneworld scenarios, and particle interaction cross sections. The expansion of the Universe in such alternative cosmologies has been parameterized through a switch-like function reproducing the Hubble law at later times. Without loss of generality, we model more realistically this transition with a smooth function. We predict a distinct pair-wise relationship between power-law exponent and particle interactions: for a given cosmology with negative (positive) exponent, smaller (larger) cross sections are required to reproduce the data. Lastly, via Bayesian methods, we quantify the epistemic uncertainty of theoretical parameters found in inverse problems.


Sentiment analysis of texts from social networks based on machine learning methods for monitoring public sentiment

arXiv.org Artificial Intelligence

A sentiment analysis system powered by machine learning was created in this study to improve real-time social network public opinion monitoring. For sophisticated sentiment identification, the suggested approach combines cutting-edge transformer-based architectures (DistilBERT, RoBERTa) with traditional machine learning models (Logistic Regression, SVM, Naive Bayes). The system achieved an accuracy of up to 80-85% using transformer models in real-world scenarios after being tested using both deep learning techniques and standard machine learning processes on annotated social media datasets. According to experimental results, deep learning models perform noticeably better than lexicon-based and conventional rule-based classifiers, lowering misclassification rates and enhancing the ability to recognize nuances like sarcasm. According to feature importance analysis, context tokens, sentiment-bearing keywords, and part-of-speech structure are essential for precise categorization. The findings confirm that AI-driven sentiment frameworks can provide a more adaptive and efficient approach to modern sentiment challenges. Despite the system's impressive performance, issues with computing overhead, data quality, and domain-specific terminology still exist. In order to monitor opinions on a broad scale, future research will investigate improving computing performance, extending coverage to various languages, and integrating real-time streaming APIs. The results demonstrate that governments, corporations, and social researchers looking for more in-depth understanding of public mood on digital platforms can find a reliable and adaptable answer in AI-powered sentiment analysis.