Bayesian Learning
Inference for Regression with Variables Generated from Unstructured Data
Battaglia, Laura, Christensen, Timothy, Hansen, Stephen, Sacher, Szymon
The leading strategy for analyzing unstructured data uses two steps. First, latent variables of economic interest are estimated with an upstream information retrieval model. Second, the estimates are treated as "data" in a downstream econometric model. We establish theoretical arguments for why this two-step strategy leads to biased inference in empirically plausible settings. More constructively, we propose a one-step strategy for valid inference that uses the upstream and downstream models jointly. The one-step strategy (i) substantially reduces bias in simulations; (ii) has quantitatively important effects in a leading application using CEO time-use data; and (iii) can be readily adapted by applied researchers.
Efficient semi-supervised inference for logistic regression under case-control studies
Quan, Zhuojun, Lin, Yuanyuan, Chen, Kani, Yu, Wen
Semi-supervised learning has received increasingly attention in statistics and machine learning. In semi-supervised learning settings, a labeled data set with both outcomes and covariates and an unlabeled data set with covariates only are collected. We consider an inference problem in semi-supervised settings where the outcome in the labeled data is binary and the labeled data is collected by case-control sampling. Case-control sampling is an effective sampling scheme for alleviating imbalance structure in binary data. Under the logistic model assumption, case-control data can still provide consistent estimator for the slope parameter of the regression model. However, the intercept parameter is not identifiable. Consequently, the marginal case proportion cannot be estimated from case-control data. We find out that with the availability of the unlabeled data, the intercept parameter can be identified in semi-supervised learning setting. We construct the likelihood function of the observed labeled and unlabeled data and obtain the maximum likelihood estimator via an iterative algorithm. The proposed estimator is shown to be consistent, asymptotically normal, and semiparametrically efficient. Extensive simulation studies are conducted to show the finite sample performance of the proposed method. The results imply that the unlabeled data not only helps to identify the intercept but also improves the estimation efficiency of the slope parameter. Meanwhile, the marginal case proportion can be estimated accurately by the proposed method.
Rapid Bayesian identification of sparse nonlinear dynamics from scarce and noisy data
Fung, Lloyd, Fasel, Urban, Juniper, Matthew P.
The pursuit of direct model equation discovery has been an ongoing and significant area of interest in scientific machine learning. The popular sparse identification of nonlinear dynamics (SINDy) framework [1] offers a promising approach to extract parsimonious equations directly from data. SINDy's promotion of parsimony by sparse regression allows for the identification of an interpretable model that balances accuracy with generalizability, while its simplicity leads to a relatively efficient and fast learning process compared to other machine learning techniques. The framework has been successfully applied in a variety of applications, such as model idenficiation in plasma physics [2], control engineering [3, 4], biological transport problems [5], socio-cognitive systems [6], epidemiology [7, 8] and turbulence modelling [9]. Furthermore, its remarkable extendibility has attracted a range of modifications, including the adaptation to discover partial differential equations [10], the extension to libraries of rational functions [11], the integration of ensembling techniques to improve data efficiency [12] and the use of weak formulations [13, 14] to avoid noise amplification when computing derivatives from discrete data. One major difficulty in using scientific machine learning methods in fields such as biophysics, ecology, and microbiology, is that measured data from these fields is often noisy and scarce.
Accelerating Convergence of Stein Variational Gradient Descent via Deep Unfolding
Kawamura, Yuya, Takabe, Satoshi
Stein variational gradient descent (SVGD) is a prominent particle-based variational inference method used for sampling a target distribution. SVGD has attracted interest for application in machine-learning techniques such as Bayesian inference. In this paper, we propose novel trainable algorithms that incorporate a deep-learning technique called deep unfolding,into SVGD. This approach facilitates the learning of the internal parameters of SVGD, thereby accelerating its convergence speed. To evaluate the proposed trainable SVGD algorithms, we conducted numerical simulations of three tasks: sampling a one-dimensional Gaussian mixture, performing Bayesian logistic regression, and learning Bayesian neural networks. The results show that our proposed algorithms exhibit faster convergence than the conventional variants of SVGD.
Learning and Sustaining Shared Normative Systems via Bayesian Rule Induction in Markov Games
Oldenburg, Ninell, Zhi-Xuan, Tan
A universal feature of human societies is the adoption of systems of rules and norms in the service of cooperative ends. How can we build learning agents that do the same, so that they may flexibly cooperate with the human institutions they are embedded in? We hypothesize that agents can achieve this by assuming there exists a shared set of norms that most others comply with while pursuing their individual desires, even if they do not know the exact content of those norms. By assuming shared norms, a newly introduced agent can infer the norms of an existing population from observations of compliance and violation. Furthermore, groups of agents can converge to a shared set of norms, even if they initially diverge in their beliefs about what the norms are. This in turn enables the stability of the normative system: since agents can bootstrap common knowledge of the norms, this leads the norms to be widely adhered to, enabling new entrants to rapidly learn those norms. We formalize this framework in the context of Markov games and demonstrate its operation in a multi-agent environment via approximately Bayesian rule induction of obligative and prohibitive norms. Using our approach, agents are able to rapidly learn and sustain a variety of cooperative institutions, including resource management norms and compensation for pro-social labor, promoting collective welfare while still allowing agents to act in their own interests.
Towards Automated Causal Discovery: a case study on 5G telecommunication data
Biza, Konstantina, Ntroumpogiannis, Antonios, Triantafillou, Sofia, Tsamardinos, Ioannis
Causal Discovery is a field of machine learning and statistics aiming to induce causal knowledge from data [29, 47]. There is a large corpus of algorithms and methodologies in the field, spanning tasks like learning causal models, estimating causal effects, and determining optimal interventions. While there are several public libraries of algorithms for these tasks, combining the algorithms and applying them to any given problem is a challenging endeavor that requires extensive knowledge of the methods and a deep understanding of the theory to interpret results. In this paper, we introduce the concept of Automated Causal Discovery (AutoCD) (not to be confused with Automated Causal Inference [14, 26]; see Section 3), defined as the effort to fully automate the application of causal discovery and causal reasoning. AutoCD's goals should be to deliver not just the optimal causal model that fits the data, but all information, answers to queries, visualizations, interpretations, and explanations that a human expert analyst would.
Stacking Factorizing Partitioned Expressions in Hybrid Bayesian Network Models
Lin, Peng, Neil, Martin, Fenton, Norman
Hybrid Bayesian networks (HBN) contain complex conditional probabilistic distributions (CPD) specified as partitioned expressions over discrete and continuous variables. The size of these CPDs grows exponentially with the number of parent nodes when using discrete inference, resulting in significant inefficiency. Normally, an effective way to reduce the CPD size is to use a binary factorization (BF) algorithm to decompose the statistical or arithmetic functions in the CPD by factorizing the number of connected parent nodes to sets of size two. However, the BF algorithm was not designed to handle partitioned expressions. Hence, we propose a new algorithm called stacking factorization (SF) to decompose the partitioned expressions. The SF algorithm creates intermediate nodes to incrementally reconstruct the densities in the original partitioned expression, allowing no more than two continuous parent nodes to be connected to each child node in the resulting HBN. SF can be either used independently or combined with the BF algorithm. We show that the SF+BF algorithm significantly reduces the CPD size and contributes to lowering the tree-width of a model, thus improving efficiency.
Nonlinear Bayesian optimal experimental design using logarithmic Sobolev inequalities
Li, Fengyi, Belhadji, Ayoub, Marzouk, Youssef
The optimal experimental design (OED) problem arises in numerous settings, with applications ranging from combustion kinetics Huan & Marzouk (2013), sensor placement for weather prediction Krause et al. (2008), containment source identification Attia et al. (2018), to pharmaceutical trials Djuris et al. (2024). A commonly addressed version of the OED problem centers on the fundamental question of selecting an optimal subset of k observations from a total pool of n possible candidates, with the goal of learning the parameters of a statistical model for the observations. In the Bayesian framework, these parameters are endowed with a prior distribution to represent our state of knowledge before seeing the data. A posterior distribution on the parameters is obtained by conditioning on the observations. A commonly used experimental design criterion is then the mutual information (MI) between the parameters and the selected observations or, equivalently, the expected information gain from prior to posterior, which should be maximized.
Rao-Blackwellising Bayesian Causal Inference
Toth, Christian, Knoll, Christian, Pernkopf, Franz, Peharz, Robert
Bayesian causal inference, i.e., inferring a posterior over causal models for the use in downstream causal reasoning tasks, poses a hard computational inference problem that is little explored in literature. In this work, we combine techniques from order-based MCMC structure learning with recent advances in gradient-based graph learning into an effective Bayesian causal inference framework. Specifically, we decompose the problem of inferring the causal structure into (i) inferring a topological order over variables and (ii) inferring the parent sets for each variable. When limiting the number of parents per variable, we can exactly marginalise over the parent sets in polynomial time. We further use Gaussian processes to model the unknown causal mechanisms, which also allows their exact marginalisation. This introduces a Rao-Blackwellization scheme, where all components are eliminated from the model, except for the causal order, for which we learn a distribution via gradient-based optimisation. The combination of Rao-Blackwellization with our sequential inference procedure for causal orders yields state-of-the-art on linear and non-linear additive noise benchmarks with scale-free and Erdos-Renyi graph structures.
Bayesian Off-Policy Evaluation and Learning for Large Action Spaces
Aouali, Imad, Brunel, Victor-Emmanuel, Rohde, David, Korba, Anna
In interactive systems, actions are often correlated, presenting an opportunity for more sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We introduce a unified Bayesian framework to capture these correlations through structured and informative priors. In this framework, we propose sDM, a generic Bayesian approach designed for OPE and OPL, grounded in both algorithmic and theoretical foundations. Notably, sDM leverages action correlations without compromising computational efficiency. Moreover, inspired by online Bayesian bandits, we introduce Bayesian metrics that assess the average performance of algorithms across multiple problem instances, deviating from the conventional worst-case assessments. We analyze sDM in OPE and OPL, highlighting the benefits of leveraging action correlations. Empirical evidence showcases the strong performance of sDM.