Goto

Collaborating Authors

 Bayesian Inference


Borrowing Strength in Distributionally Robust Optimization via Hierarchical Dirichlet Processes

arXiv.org Machine Learning

This paper presents a novel optimization framework to address key challenges presented by modern machine learning applications: High dimensionality, distributional uncertainty, and data heterogeneity. Our approach unifies regularized estimation, distributionally robust optimization (DRO), and hierarchical Bayesian modeling in a single data-driven criterion. By employing a hierarchical Dirichlet process (HDP) prior, the method effectively handles multi-source data, achieving regularization, distributional robustness, and borrowing strength across diverse yet related data-generating processes. We demonstrate the method's advantages by establishing theoretical performance guarantees and tractable Monte Carlo approximations based on Dirichlet process (DP) theory. Numerical experiments validate the framework's efficacy in improving and stabilizing both prediction and parameter estimation accuracy, showcasing its potential for application in complex data environments.


Gaussian Measures Conditioned on Nonlinear Observations: Consistency, MAP Estimators, and Simulation

arXiv.org Machine Learning

The article presents a systematic study of the problem of conditioning a Gaussian random variable $\xi$ on nonlinear observations of the form $F \circ \phi(\xi)$ where $\phi: \mathcal{X} \to \mathbb{R}^N$ is a bounded linear operator and $F$ is nonlinear. Such problems arise in the context of Bayesian inference and recent machine learning-inspired PDE solvers. We give a representer theorem for the conditioned random variable $\xi \mid F\circ \phi(\xi)$, stating that it decomposes as the sum of an infinite-dimensional Gaussian (which is identified analytically) as well as a finite-dimensional non-Gaussian measure. We also introduce a novel notion of the mode of a conditional measure by taking the limit of the natural relaxation of the problem, to which we can apply the existing notion of maximum a posteriori estimators of posterior measures. Finally, we introduce a variant of the Laplace approximation for the efficient simulation of the aforementioned conditioned Gaussian random variables towards uncertainty quantification.


Beyond MLE: Investigating SEARNN for Low-Resourced Neural Machine Translation

arXiv.org Artificial Intelligence

Structured prediction tasks, like machine translation, involve learning functions that map structured inputs to structured outputs. Recurrent Neural Networks (RNNs) have historically been a popular choice for such tasks, including in natural language processing (NLP) applications. However, training RNNs using Maximum Likelihood Estimation (MLE) has its limitations, including exposure bias and a mismatch between training and testing metrics. SEARNN, based on the learning to search (L2S) framework, has been proposed as an alternative to MLE for RNN training. This project explored the potential of SEARNN to improve machine translation for low-resourced African languages -- a challenging task characterized by limited training data availability and the morphological complexity of the languages. Through experiments conducted on translation for English to Igbo, French to \ewe, and French to \ghomala directions, this project evaluated the efficacy of SEARNN over MLE in addressing the unique challenges posed by these languages. With an average BLEU score improvement of $5.4$\% over the MLE objective, we proved that SEARNN is indeed a viable algorithm to effectively train RNNs on machine translation for low-resourced languages.


Particle swarm optimization with Applications to Maximum Likelihood Estimation and Penalized Negative Binomial Regression

arXiv.org Machine Learning

These authors contribute to the paper equally. Abstract General purpose optimization routines such as nlminb, optim (R) or nlmixed (SAS) are frequently used to estimate model parameters in nonstandard distributions. This paper presents Particle Swarm Optimization (PSO), as an alternative to many of the current algorithms used in statistics. We find that PSO can not only reproduce the same results as the above routines, it can also produce results that are more optimal or when others cannot converge. In the latter case, it can also identify the source of the problem or problems. We highlight advantages of using PSO using four examples, where: (1) some parameters in a generalized distribution are unidentified using PSO when it is not apparent or computationally manifested using routines in R or SAS; (2) PSO can produce estimation results for the log-binomial regressions when current routines may not; (3) PSO provides flexibility in the link function for binomial regression with LASSO penalty, which is unsupported by standard packages like GLM and GENMOD in Stata and SAS, respectively, and (4) PSO provides superior MLE estimates for an EE-IW distribution compared with those from the traditional statistical methods that rely on moments. Metaheuristics, and in particular, nature-inspired metaheuristic algorithms, is increasingly used across disciplines to tackle challenging optimization problems [11]. They may be broadly categorized swarm based or evolutionary based algorithms. Some examples of the former are particle swarm optimization and competitive swarm optimizer (CSO) and examples of the latter are genetic algorithm (GA) and the differential evolution. The statistical community is probably most aware of GA and simulated annealing (SA) but they are many others that have recently proven more popular in engineering and computer science.


Towards Optimal Beacon Placement for Range-Aided Localization

arXiv.org Artificial Intelligence

Range-based localization is ubiquitous: global navigation satellite systems (GNSS) power mobile phone-based navigation, and autonomous mobile robots can use range measurements from a variety of modalities including sonar, radar, and even WiFi signals. Many of these localization systems rely on fixed anchors or beacons with known positions acting as transmitters or receivers. In this work, we answer a fundamental question: given a set of positions we would like to localize, how should beacons be placed so as to minimize localization error? Specifically, we present an information theoretic method for optimally selecting an arrangement consisting of a few beacons from a large set of candidate positions. By formulating localization as maximum a posteriori (MAP) estimation, we can cast beacon arrangement as a submodular set function maximization problem. This approach is probabilistically rigorous, simple to implement, and extremely flexible. Furthermore, we prove that the submodular structure of our problem formulation ensures that a greedy algorithm for beacon arrangement has suboptimality guarantees. We compare our method with a number of benchmarks on simulated data and release an open source Python implementation of our algorithm and experiments.


Accelerating Multilevel Markov Chain Monte Carlo Using Machine Learning Models

arXiv.org Machine Learning

This work presents an efficient approach for accelerating multilevel Markov Chain Monte Carlo (MCMC) sampling for large-scale problems using low-fidelity machine learning models. While conventional techniques for large-scale Bayesian inference often substitute computationally expensive high-fidelity models with machine learning models, thereby introducing approximation errors, our approach offers a computationally efficient alternative by augmenting high-fidelity models with low-fidelity ones within a hierarchical framework. The multilevel approach utilizes the low-fidelity machine learning model (MLM) for inexpensive evaluation of proposed samples thereby improving the acceptance of samples by the high-fidelity model. The hierarchy in our multilevel algorithm is derived from geometric multigrid hierarchy. We utilize an MLM to acclerate the coarse level sampling. Training machine learning model for the coarsest level significantly reduces the computational cost associated with generating training data and training the model. We present an MCMC algorithm to accelerate the coarsest level sampling using MLM and account for the approximation error introduced. We provide theoretical proofs of detailed balance and demonstrate that our multilevel approach constitutes a consistent MCMC algorithm. Additionally, we derive conditions on the accuracy of the machine learning model to facilitate more efficient hierarchical sampling. Our technique is demonstrated on a standard benchmark inference problem in groundwater flow, where we estimate the probability density of a quantity of interest using a four-level MCMC algorithm. Our proposed algorithm accelerates multilevel sampling by a factor of two while achieving similar accuracy compared to sampling using the standard multilevel algorithm.


The Landscape of Unfolding with Machine Learning

arXiv.org Artificial Intelligence

Recent innovations from machine learning allow for data unfolding, without binning and including correlations across many dimensions. We describe a set of known, upgraded, and new methods for ML-based unfolding. The performance of these approaches are evaluated on the same two datasets. We find that all techniques are capable of accurately reproducing the particle-level spectra across complex observables. Given that these approaches are conceptually diverse, they offer an exciting toolkit for a new class of measurements that can probe the Standard Model with an unprecedented level of detail and may enable sensitivity to new phenomena.


Decoding Geometric Properties in Non-Random Data from First Information-Theoretic Principles

arXiv.org Artificial Intelligence

Based on the principles of information theory, measure theory, and theoretical computer science, we introduce a univariate signal deconvolution method with a wide range of applications to coding theory, particularly in zero-knowledge one-way communication channels, such as in deciphering messages from unknown generating sources about which no prior knowledge is available and to which no return message can be sent. Our multidimensional space reconstruction method from an arbitrary received signal is proven to be agnostic vis-a-vis the encoding-decoding scheme, computation model, programming language, formal theory, the computable (or semi-computable) method of approximation to algorithmic complexity, and any arbitrarily chosen (computable) probability measure of the events. The method derives from the principles of an approach to Artificial General Intelligence capable of building a general-purpose model of models independent of any arbitrarily assumed prior probability distribution. We argue that this optimal and universal method of decoding non-random data has applications to signal processing, causal deconvolution, topological and geometric properties encoding, cryptography, and bio- and technosignature detection.


Active Learning with Fully Bayesian Neural Networks for Discontinuous and Nonstationary Data

arXiv.org Artificial Intelligence

Active learning optimizes the exploration of large parameter spaces by strategically selecting which experiments or simulations to conduct, thus reducing resource consumption and potentially accelerating scientific discovery. A key component of this approach is a probabilistic surrogate model, typically a Gaussian Process (GP), which approximates an unknown functional relationship between control parameters and a target property. However, conventional GPs often struggle when applied to systems with discontinuities and non-stationarities, prompting the exploration of alternative models. This limitation becomes particularly relevant in physical science problems, which are often characterized by abrupt transitions between different system states and rapid changes in physical property behavior. Fully Bayesian Neural Networks (FBNNs) serve as a promising substitute, treating all neural network weights probabilistically and leveraging advanced Markov Chain Monte Carlo techniques for direct sampling from the posterior distribution. This approach enables FBNNs to provide reliable predictive distributions, crucial for making informed decisions under uncertainty in the active learning setting. Although traditionally considered too computationally expensive for 'big data' applications, many physical sciences problems involve small amounts of data in relatively low-dimensional parameter spaces. Here, we assess the suitability and performance of FBNNs with the No-U-Turn Sampler for active learning tasks in the 'small data' regime, highlighting their potential to enhance predictive accuracy and reliability on test functions relevant to problems in physical sciences.


RuleFuser: Injecting Rules in Evidential Networks for Robust Out-of-Distribution Trajectory Prediction

arXiv.org Artificial Intelligence

Modern neural trajectory predictors in autonomous driving are developed using imitation learning (IL) from driving logs. Although IL benefits from its ability to glean nuanced and multi-modal human driving behaviors from large datasets, the resulting predictors often struggle with out-of-distribution (OOD) scenarios and with traffic rule compliance. On the other hand, classical rule-based predictors, by design, can predict traffic rule satisfying behaviors while being robust to OOD scenarios, but these predictors fail to capture nuances in agent-to-agent interactions and human driver's intent. In this paper, we present RuleFuser, a posterior-net inspired evidential framework that combines neural predictors with classical rule-based predictors to draw on the complementary benefits of both, thereby striking a balance between performance and traffic rule compliance. The efficacy of our approach is demonstrated on the real-world nuPlan dataset where RuleFuser leverages the higher performance of the neural predictor in in-distribution (ID) scenarios and the higher safety offered by the rule-based predictor in OOD scenarios.