Bayesian Learning
Structural restrictions in local causal discovery: identifying direct causes of a target variable
Bodik, Juraj, Chavez-Demoulin, Valรฉrie
We consider the problem of learning a set of direct causes of a target variable from an observational joint distribution. Learning directed acyclic graphs (DAGs) that represent the causal structure is a fundamental problem in science. Several results are known when the full DAG is identifiable from the distribution, such as assuming a nonlinear Gaussian data-generating process. Often, we are only interested in identifying the direct causes of one target variable (local causal structure), not the full DAG. In this paper, we discuss different assumptions for the data-generating process of the target variable under which the set of direct causes is identifiable from the distribution. While doing so, we put essentially no assumptions on the variables other than the target variable. In addition to the novel identifiability results, we provide two practical algorithms for estimating the direct causes from a finite random sample and demonstrate their effectiveness on several benchmark datasets. We apply this framework to learn direct causes of the reduction in fertility rates in different countries.
Simulation-based Inference for Cardiovascular Models
Wehenkel, Antoine, Behrmann, Jens, Miller, Andrew C., Sapiro, Guillermo, Sener, Ozan, Cuturi, Marco, Jacobsen, Jรถrn-Henrik
Over the past decades, hemodynamics simulators have steadily evolved and have become tools of choice for studying cardiovascular systems in-silico. While such tools are routinely used to simulate whole-body hemodynamics from physiological parameters, solving the corresponding inverse problem of mapping waveforms back to plausible physiological parameters remains both promising and challenging. Motivated by advances in simulation-based inference (SBI), we cast this inverse problem as statistical inference. In contrast to alternative approaches, SBI provides \textit{posterior distributions} for the parameters of interest, providing a \textit{multi-dimensional} representation of uncertainty for \textit{individual} measurements. We showcase this ability by performing an in-silico uncertainty analysis of five biomarkers of clinical interest comparing several measurement modalities. Beyond the corroboration of known facts, such as the feasibility of estimating heart rate, our study highlights the potential of estimating new biomarkers from standard-of-care measurements. SBI reveals practically relevant findings that cannot be captured by standard sensitivity analyses, such as the existence of sub-populations for which parameter estimation exhibits distinct uncertainty regimes. Finally, we study the gap between in-vivo and in-silico with the MIMIC-III waveform database and critically discuss how cardiovascular simulations can inform real-world data analysis.
Proximal nested sampling with data-driven priors for physical scientists
McEwen, Jason D., Liaudat, Tobรญas I., Price, Matthew A., Cai, Xiaohao, Pereyra, Marcelo
Proximal nested sampling was introduced recently to open up Bayesian model selection for high-dimensional problems such as computational imaging. The framework is suitable for models with a log-convex likelihood, which are ubiquitous in the imaging sciences. The purpose of this article is two-fold. First, we review proximal nested sampling in a pedagogical manner in an attempt to elucidate the framework for physical scientists. Second, we show how proximal nested sampling can be extended in an empirical Bayes setting to support data-driven priors, such as deep neural networks learned from training data.
Uncertainty in Natural Language Generation: From Theory to Applications
Baan, Joris, Daheim, Nico, Ilia, Evgenia, Ulmer, Dennis, Li, Haau-Sing, Fernรกndez, Raquel, Plank, Barbara, Sennrich, Rico, Zerva, Chrysoula, Aziz, Wilker
Recent advances of powerful Language Models have allowed Natural Language Generation (NLG) to emerge as an important technology that can not only perform traditional tasks like summarisation or translation, but also serve as a natural language interface to a variety of applications. As such, it is crucial that NLG systems are trustworthy and reliable, for example by indicating when they are likely to be wrong; and supporting multiple views, backgrounds and writing styles -- reflecting diverse human sub-populations. In this paper, we argue that a principled treatment of uncertainty can assist in creating systems and evaluation protocols better aligned with these goals. We first present the fundamental theory, frameworks and vocabulary required to represent uncertainty. We then characterise the main sources of uncertainty in NLG from a linguistic perspective, and propose a two-dimensional taxonomy that is more informative and faithful than the popular aleatoric/epistemic dichotomy. Finally, we move from theory to applications and highlight exciting research directions that exploit uncertainty to power decoding, controllable generation, self-assessment, selective answering, active learning and more.
Case Studies of Causal Discovery from IT Monitoring Time Series
Aรฏt-Bachir, Ali, Assaad, Charles K., de Bignicourt, Christophe, Devijver, Emilie, Ferreira, Simon, Gaussier, Eric, Mohanna, Hosein, Zan, Lei
Information technology (IT) systems are vital for modern businesses, handling data storage, communication, and process automation. Monitoring these systems is crucial for their proper functioning and efficiency, as it allows collecting extensive observational time series data for analysis. The interest in causal discovery is growing in IT monitoring systems as knowing causal relations between different components of the IT system helps in reducing downtime, enhancing system performance and identifying root causes of anomalies and incidents. It also allows proactive prediction of future issues through historical data analysis. Despite its potential benefits, applying causal discovery algorithms on IT monitoring data poses challenges, due to the complexity of the data. For instance, IT monitoring data often contains misaligned time series, sleeping time series, timestamp errors and missing values. This paper presents case studies on applying causal discovery algorithms to different IT monitoring datasets, highlighting benefits and ongoing challenges.
Automating Model Comparison in Factor Graphs
van Erp, Bart, Nuijten, Wouter W. L., van de Laar, Thijs, de Vries, Bert
The famous aphorism of George Box states: "all models are wrong, but some are useful" [1]. It is the task of statisticians and data analysts to find a model which is most useful for a given problem. The build, compute, critique and repeat cycle [2], also known as Box's loop [3], is an iterative approach for finding the most useful model. Any efforts in shortening this design cycle increase the chances of developing more useful models, which in turn might yield more reliable predictions, more profitable returns or more efficient operations for the problem at hand. In this paper we choose to adopt the Bayesian formalism and therefore we will specify all tasks in Box's loop as principled probabilistic inference tasks. In addition to the well-known parameter and state inference tasks, the critique step in the design cycle is also phrased as an inference task, known as Bayesian model comparison, which automatically embodies Occam's razor [4, Ch. 28.1]. Opposed to just selecting a single model in the critique step, for different models we better quantify our confidence about which model is best, especially when data is limited [5, Ch. 18.5.1]. The uncertainty arising from prior beliefs p(m) over a set of models m and limited observations can be naturally included through the use of Bayes' theorem
Consistent Range Approximation for Fair Predictive Modeling
Zhu, Jiongli, Galhotra, Sainyam, Sabri, Nazanin, Salimi, Babak
This paper proposes a novel framework for certifying the fairness of predictive models trained on biased data. It draws from query answering for incomplete and inconsistent databases to formulate the problem of consistent range approximation (CRA) of fairness queries for a predictive model on a target population. The framework employs background knowledge of the data collection process and biased data, working with or without limited statistics about the target population, to compute a range of answers for fairness queries. Using CRA, the framework builds predictive models that are certifiably fair on the target population, regardless of the availability of external data during training. The framework's efficacy is demonstrated through evaluations on real data, showing substantial improvement over existing state-of-the-art methods.
No-Regret Constrained Bayesian Optimization of Noisy and Expensive Hybrid Models using Differentiable Quantile Function Approximations
This paper investigates the problem of efficient constrained global optimization of hybrid models that are a composition of a known white-box function and an expensive multi-output black-box function subject to noisy observations, which often arises in real-world science and engineering applications. We propose a novel method, Constrained Upper Quantile Bound (CUQB), to solve such problems that directly exploits the composite structure of the objective and constraint functions that we show leads substantially improved sampling efficiency. CUQB is a conceptually simple, deterministic approach that avoid constraint approximations used by previous methods. Although the CUQB acquisition function is not available in closed form, we propose a novel differentiable sample average approximation that enables it to be efficiently maximized. We further derive bounds on the cumulative regret and constraint violation under a non-parametric Bayesian representation of the black-box function. Since these bounds depend sublinearly on the number of iterations under some regularity assumptions, we establis bounds on the convergence rate to the optimal solution of the original constrained problem. In contrast to most existing methods, CUQB further incorporates a simple infeasibility detection scheme, which we prove triggers in a finite number of iterations when the original problem is infeasible (with high probability given the Bayesian model). Numerical experiments on several test problems, including environmental model calibration and real-time optimization of a reactor system, show that CUQB significantly outperforms traditional Bayesian optimization in both constrained and unconstrained cases. Furthermore, compared to other state-of-the-art methods that exploit composite structure, CUQB achieves competitive empirical performance while also providing substantially improved theoretical guarantees.
Network Fault-tolerant and Byzantine-resilient Social Learning via Collaborative Hierarchical Non-Bayesian Learning
Mclaughlin, Connor, Ding, Matthew, Edogmus, Denis, Su, Lili
--As the network scale increases, existing fully distributed solutions start to lag behind the real-world challenges such as (1) slow information propagation, (2) network communication failures, and (3) external adversarial attacks. In this paper, we focus on hierarchical system architecture and address the problem of non-Bayesian learning over networks that are vulnerable to communication failures and adversarial attacks. On network communication, we consider packet-dropping link failures. We first propose a hierarchical robust push-sum algorithm that can achieve average consensus despite frequent packet-dropping link failures. We provide a sparse information fusion rule between the parameter server and arbitrarily selected network representatives. Then, interleaving the consensus update step with a dual averaging update with Kullback-Leibler (KL) divergence as the proximal function, we obtain a packet-dropping fault-tolerant non-Bayesian learning algorithm with provable convergence guarantees. On external adversarial attacks, we consider Byzantine attacks in which the compromised agents can send maliciously calibrated messages to others (including both the agents and the parameter server). T o avoid the curse of dimensionality of Byzantine consensus, we solve the non-Bayesian learning problem via running multiple dynamics, each of which only involves Byzantine consensus with scalar inputs. T o facilitate resilient information propagation across sub-networks, we use a novel Byzantine-resilient gossiping-type rule at the parameter server . As the scale of the multi-agent network increases, existing fully distributed solutions start to lag behind the crucial real-world challenges such as (1) slow information propagation, (2) network communication failures, and (3) external adversarial attacks.
Rapid and Scalable Bayesian AB Testing
Chennu, Srivas, Maher, Andrew, Pangerl, Christian, Prabanantham, Subash, Bae, Jae Hyeon, Martin, Jamie, Goswami, Bud
AB testing aids business operators with their decision making, and is considered the gold standard method for learning from data to improve digital user experiences. However, there is usually a gap between the requirements of practitioners, and the constraints imposed by the statistical hypothesis testing methodologies commonly used for analysis of AB tests. These include the lack of statistical power in multivariate designs with many factors, correlations between these factors, the need of sequential testing for early stopping, and the inability to pool knowledge from past tests. Here, we propose a solution that applies hierarchical Bayesian estimation to address the above limitations. In comparison to current sequential AB testing methodology, we increase statistical power by exploiting correlations between factors, enabling sequential testing and progressive early stopping, without incurring excessive false positive risk. We also demonstrate how this methodology can be extended to enable the extraction of composite global learnings from past AB tests, to accelerate future tests. We underpin our work with a solid theoretical framework that articulates the value of hierarchical estimation. We demonstrate its utility using both numerical simulations and a large set of real-world AB tests. Together, these results highlight the practical value of our approach for statistical inference in the technology industry.