Goto

Collaborating Authors

 Bayesian Inference


Causal Structure Learning by Using Intersection of Markov Blankets

arXiv.org Artificial Intelligence

In this paper, we introduce a novel causal structure learning algorithm called Endogenous and Exogenous Markov Blankets Intersection (EEMBI), which combines the properties of Bayesian networks and Structural Causal Models (SCM). Exogenous variables are special variables that are applied in SCM. We find that exogenous variables have some special characteristics and these characteristics are still useful under the property of the Bayesian network. EEMBI intersects the Markov blankets of exogenous variables and Markov blankets of endogenous variables, i.e. the original variables, to remove the irrelevant connections and find the true causal structure theoretically. Furthermore, we propose an extended version of EEMBI, namely EEMBI-PC, which integrates the last step of the PC algorithm into EEMBI. This modification enhances the algorithm's performance by leveraging the strengths of both approaches. Plenty of experiments are provided to prove that EEMBI and EEMBI-PC have state-of-the-art performance on both discrete and continuous datasets.


Fuzzy-Conditioned Diffusion and Diffusion Projection Attention Applied to Facial Image Correction

arXiv.org Artificial Intelligence

Image diffusion has recently shown remarkable performance in image synthesis and implicitly as an image prior. Such a prior has been used with conditioning to solve the inpainting problem, but only supporting binary user-based conditioning. We derive a fuzzy-conditioned diffusion, where implicit diffusion priors can be exploited with controllable strength. Our fuzzy conditioning can be applied pixel-wise, enabling the modification of different image components to varying degrees. Additionally, we propose an application to facial image correction, where we combine our fuzzy-conditioned diffusion with diffusion-derived attention maps. Our map estimates the degree of anomaly, and we obtain it by projecting on the diffusion space. We show how our approach also leads to interpretable and autonomous facial image correction.


Reconstructing Graph Diffusion History from a Single Snapshot

arXiv.org Artificial Intelligence

Diffusion on graphs is ubiquitous with numerous high-impact applications. In these applications, complete diffusion histories play an essential role in terms of identifying dynamical patterns, reflecting on precaution actions, and forecasting intervention effects. Despite their importance, complete diffusion histories are rarely available and are highly challenging to reconstruct due to ill-posedness, explosive search space, and scarcity of training data. To date, few methods exist for diffusion history reconstruction. They are exclusively based on the maximum likelihood estimation (MLE) formulation and require to know true diffusion parameters. In this paper, we study an even harder problem, namely reconstructing Diffusion history from A single SnapsHot} (DASH), where we seek to reconstruct the history from only the final snapshot without knowing true diffusion parameters. We start with theoretical analyses that reveal a fundamental limitation of the MLE formulation. We prove: (a) estimation error of diffusion parameters is unavoidable due to NP-hardness of diffusion parameter estimation, and (b) the MLE formulation is sensitive to estimation error of diffusion parameters. To overcome the inherent limitation of the MLE formulation, we propose a novel barycenter formulation: finding the barycenter of the posterior distribution of histories, which is provably stable against the estimation error of diffusion parameters. We further develop an effective solver named DIffusion hiTting Times with Optimal proposal (DITTO) by reducing the problem to estimating posterior expected hitting times via the Metropolis--Hastings Markov chain Monte Carlo method (M--H MCMC) and employing an unsupervised graph neural network to learn an optimal proposal to accelerate the convergence of M--H MCMC. We conduct extensive experiments to demonstrate the efficacy of the proposed method.


Truth Discovery in Sequence Labels from Crowds

arXiv.org Artificial Intelligence

Annotation quality and quantity positively affect the learning performance of sequence labeling, a vital task in Natural Language Processing. Hiring domain experts to annotate a corpus is very costly in terms of money and time. Crowdsourcing platforms, such as Amazon Mechanical Turk (AMT), have been deployed to assist in this purpose. However, the annotations collected this way are prone to human errors due to the lack of expertise of the crowd workers. Existing literature in annotation aggregation assumes that annotations are independent and thus faces challenges when handling the sequential label aggregation tasks with complex dependencies. To conquer the challenges, we propose an optimization-based method that infers the ground truth labels using annotations provided by workers for sequential labeling tasks. The proposed Aggregation method for Sequential Labels from Crowds ($AggSLC$) jointly considers the characteristics of sequential labeling tasks, workers' reliabilities, and advanced machine learning techniques. Theoretical analysis on the algorithm's convergence further demonstrates that the proposed $AggSLC$ halts after a finite number of iterations. We evaluate $AggSLC$ on different crowdsourced datasets for Named Entity Recognition (NER) tasks and Information Extraction tasks in biomedical (PICO), as well as a simulated dataset. Our results show that the proposed method outperforms the state-of-the-art aggregation methods. To achieve insights into the framework, we study the effectiveness of $AggSLC$'s components through ablation studies.


Analysis of Climate Campaigns on Social Media using Bayesian Model Averaging

arXiv.org Artificial Intelligence

Climate change is the defining issue of our time, and we are at a defining moment. Various interest groups, social movement organizations, and individuals engage in collective action on this issue on social media. In addition, issue advocacy campaigns on social media often arise in response to ongoing societal concerns, especially those faced by energy industries. Our goal in this paper is to analyze how those industries, their advocacy group, and climate advocacy group use social media to influence the narrative on climate change. In this work, we propose a minimally supervised model soup [57] approach combined with messaging themes to identify the stances of climate ads on Facebook. Finally, we release our stance dataset, model, and set of themes related to climate campaigns for future work on opinion mining and the automatic detection of climate change stances.


Parameter Identification for Partial Differential Equations with Spatiotemporal Varying Coefficients

arXiv.org Artificial Intelligence

To comprehend complex systems with multiple states, it is imperative to reveal the identity of these states by system outputs. Nevertheless, the mathematical models describing these systems often exhibit nonlinearity so that render the resolution of the parameter inverse problem from the observed spatiotemporal data a challenging endeavor. Starting from the observed data obtained from such systems, we propose a novel framework that facilitates the investigation of parameter identification for multi-state systems governed by spatiotemporal varying parametric partial differential equations. Our framework consists of two integral components: a constrained self-adaptive physics-informed neural network, encompassing a sub-network, as our methodology for parameter identification, and a finite mixture model approach to detect regions of probable parameter variations. Through our scheme, we can precisely ascertain the unknown varying parameters of the complex multi-state system, thereby accomplishing the inversion of the varying parameters. Furthermore, we have showcased the efficacy of our framework on two numerical cases: the 1D Burgers' equation with time-varying parameters and the 2D wave equation with a space-varying parameter.


First-Order Context-Specific Likelihood Weighting in Hybrid Probabilistic Logic Programs

Journal of Artificial Intelligence Research

Statistical relational AI and probabilistic logic programming have so far mostly focused on discrete probabilistic models. The reasons for this is that one needs to provide constructs to succinctly model the independencies in such models, and also provide efficient inference. Three types of independencies are important to represent and exploit for scalable inference in hybrid models: conditional independencies elegantly modeled in Bayesian networks, context-specific independencies naturally represented by logical rules, and independencies amongst attributes of related objects in relational models succinctly expressed by combining rules. This paper introduces a hybrid probabilistic logic programming language, DC#, which integrates distributional clauses' syntax and semantics principles of Bayesian logic programs. It represents the three types of independencies qualitatively. More importantly, we also introduce the scalable inference algorithm FO-CS-LW for DC#. FO-CS-LW is a first-order extension of the context-specific likelihood weighting algorithm (CS-LW), a novel sampling method that exploits conditional independencies and context-specific independencies in ground models. The FO-CS-LW algorithm upgrades CS-LW with unification and combining rules to the first-order case.


The most likely common cause

arXiv.org Artificial Intelligence

The common cause principle for two random variables $A$ and $B$ is examined in the case of causal insufficiency, when their common cause $C$ is known to exist, but only the joint probability of $A$ and $B$ is observed. As a result, $C$ cannot be uniquely identified (the latent confounder problem). We show that the generalized maximum likelihood method can be applied to this situation and allows identification of $C$ that is consistent with the common cause principle. It closely relates to the maximum entropy principle. Investigation of the two binary symmetric variables reveals a non-analytic behavior of conditional probabilities reminiscent of a second-order phase transition. This occurs during the transition from correlation to anti-correlation in the observed probability distribution. The relation between the generalized likelihood approach and alternative methods, such as predictive likelihood and the minimum common cause entropy, is discussed. The consideration of the common cause for three observed variables (and one hidden cause) uncovers causal structures that defy representation through directed acyclic graphs with the Markov condition.


Redeeming Data Science by Decision Modelling

arXiv.org Artificial Intelligence

With the explosion of applications of Data Science, the field is has come loose from its foundations. This article argues for a new program of applied research in areas familiar to researchers in Bayesian methods in AI that are needed to ground the practice of Data Science by borrowing from AI techniques for model formulation that we term ``Decision Modelling.'' This article briefly reviews the formulation process as building a causal graphical model, then discusses the process in terms of six principles that comprise \emph{Decision Quality}, a framework from the popular business literature. We claim that any successful applied ML modelling effort must include these six principles. We explain how Decision Modelling combines a conventional machine learning model with an explicit value model. To give a specific example we show how this is done by integrating a model's ROC curve with a utility model.


Hierarchical Bayesian Regression for Multi-Location Sales Transaction Forecasting

arXiv.org Artificial Intelligence

The features in many prediction models naturally take the form of a hierarchy. The lower levels represent individuals or events. These units group naturally into locations and intervals or other aggregates, often at multiple levels. Levels of groupings may intersect and join, much as relational database tables do. Besides representing the structure of the data, predictive features in hierarchical models can be assigned to their proper levels. Such models lend themselves to hierarchical Bayes solution methods that ``share'' results of inference between groups by generalizing over the case of individual models for each group versus one model that aggregates all groups into one. In this paper we show our work-in-progress applying a hierarchical Bayesian model to forecast purchases throughout the day at store franchises, with groupings over locations and days of the week. We demonstrate using the \textsf{stan} package on individual sales transaction data collected over the course of a year. We show how this solves the dilemma of having limited data and hence modest accuracy for each day and location, while being able to scale to a large number of locations with improved accuracy.