Bayesian Inference
Inverse Ising inference from high-temperature re-weighting of observations
Jo, Junghyo, Hoang, Danh-Tai, Periwal, Vipul
Maximum Likelihood Estimation (MLE) is the bread and butter of system inference for stochastic systems. In some generality, MLE will converge to the correct model in the infinite data limit. In the context of physical approaches to system inference, such as Boltzmann machines, MLE requires the arduous computation of partition functions summing over all configurations, both observed and unobserved. We present here a conceptually and computationally transparent data-driven approach to system inference that is based on the simple question: How should the Boltzmann weights of observed configurations be modified to make the probability distribution of observed configurations close to a flat distribution? This algorithm gives accurate inference by using only observed configurations for systems with a large number of degrees of freedom where other approaches are intractable.
Large-Scale Local Causal Inference of Gene Regulatory Relationships
Bucur, Ioan Gabriel, Claassen, Tom, Heskes, Tom
Gene regulatory networks play a crucial role in controlling an organism's biological processes, which is why there is significant interest in developing computational methods that are able to extract their structure from high-throughput genetic data. Many of these computational methods are designed to infer individual regulatory relationships among genes from data on gene expression. We propose a novel efficient Bayesian method for discovering local causal relationships among triplets of (normally distributed) variables. In our approach, we score covariance structures for each triplet in one go and incorporate available background knowledge in the form of priors to derive posterior probabilities over local causal structures. Our method is flexible in the sense that it allows for different types of causal structures and assumptions. We apply our approach to the task of learning causal regulatory relationships among genes. We show that the proposed algorithm produces stable and conservative posterior probability estimates over local causal structures that can be used to derive an honest ranking of the most meaningful regulatory relationships. We demonstrate the stability and efficacy of our method both on simulated data and on real-world data from an experiment on yeast. Introduction Gene regulatory networks (GRNs) play a crucial role in controlling an organism's biological processes, such as cell differentiation and metabolism [1]. If we knew the structure of a GRN, we could intervene in the developmental process of the organism, for instance by targeting a specific gene with drugs. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/ . Gene regulatory relationships are inherently causal: one can manipulate the expression level of one gene (the'cause') to regulate that of another gene (the'effect'). Because of this, many GRN inference algorithms rely on causal modeling. Causal networks such as GRNs can be inferred globally or locally.
A Bayesian Approach to Direct and Inverse Abstract Argumentation Problems
This paper studies a fundamental mechanism of how to detect a conflict between arguments given sentiments regarding acceptability of the arguments. We introduce a concept of the inverse problem of the abstract argumentation to tackle the problem. Given noisy sets of acceptable arguments, it aims to find attack relations explaining the sets well in terms of acceptability semantics. It is the inverse of the direct problem corresponding to the traditional problem of the abstract argumentation that focuses on finding sets of acceptable arguments in terms of the semantics given an attack relation between the arguments. We give a probabilistic model handling both of the problems in a way that is faithful to the acceptability semantics. From a theoretical point of view, we show that a solution to both the direct and inverse problems is a special case of the probabilistic inference on the model. We discuss that the model provides a natural extension of the semantics to cope with uncertain attack relations distributed probabilistically. From en empirical point of view, we argue that it reasonably predicts individuals sentiments regarding acceptability of arguments. This paper contributes to lay the foundation for making acceptability semantics data-driven and to provide a way to tackle the knowledge acquisition bottleneck.
Resources for Getting Started With Probability in Machine Learning
Machine Learning is a field of computer science concerned with developing systems that can learn from data. Like statistics and linear algebra, probability is another foundational field that supports machine learning. Probability is a field of mathematics concerned with quantifying uncertainty. Many aspects of machine learning are uncertain, including, most critically, observations from the problem domain and the relationships learned by models from that data. As such, some understanding of probability and tools and methods used in the field are required by a machine learning practitioner to be effective.
Bayesian Network Based Risk and Sensitivity Analysis for Production Process Stability Control
Xie, Wei, Wang, Bo, Li, Cheng, Auclair, Jared, Baker, Peter
The biomanufacturing industry is growing rapidly and becoming one of the key drivers of personalized medicine and life science. However, biopharmaceutical production faces critical challenges, including complexity, high variability, long lead time and rapid changes in technologies, processes, and regulatory environment. Driven by these challenges, we explore the biotechnology domain knowledge and propose a rigorous risk and sensitivity analysis framework for biomanufacturing innovation. Built on the causal relationships of raw material quality attributes, production process, and bio-drug properties in safety and efficacy, we develop a Bayesian Network (BN) to model the complex probabilistic interdependence between process parameters and quality attributes of raw materials/in-process materials/drug substance. It integrates various sources of data and leads to an interpretable probabilistic knowledge graph of the end-to-end production process. Then, we introduce a systematic risk analysis to assess the criticality of process parameters and quality attributes. The complex production processes often involve many process parameters and quality attributes impacting on the product quality variability. However, the real-world (batch) data are often limited, especially for customized and personalized bio-drugs. We propose uncertainty quantification and sensitivity analysis to analyze the impact of model risk. Given very limited process data, the empirical results show that we can provide reliable and inter-Corresponding author Email addresses: w.xie@northeastern.edu Thus, the proposed framework can provide the science-and risk-based guidance on the process monitoring, data collection, and process parameters specifications to facilitate the production process learning and stability control. Keywords: Decision analysis, biomanufacturing, Bayesian network, production process risk analysis, sensitivity analysis 2017 MSC: 00-01, 99-00 1. Introduction In the past decades, pharmaceutical companies have invested billions of dollars in the research and development (R&D) of new biomedicines for the treatment of many severe illnesses, including cancer cells and adult blindness. More than 40 percent of the overall pharmaceutical industry R&D and products in the development pipeline are biopharmaceuticals and this percentage is expected to continuously increase. Compared to the classical pharmaceutical manufacturing, biopharmaceutical production faces several challenges, including complexity, high variability, long lead time and rapid changes in technologies, processes, and regulatory environment (Kaminsky & Wang, 2015). Biotechnology products are produced in living organisms, which induces a lot of uncertainty in the production process.
Incremental learning of environment interactive structures from trajectories of individuals
Campo, Damian, Bastani, Vahid, Marcenaro, Lucio, Regazzoni, Carlo
F ORCE FIELD TERMINOLOGY Taking into consideration a classical mechanics approach, a force is defined as a vectorial quantity that acts on a body to cause a change in its state of motion [25]. Forces can be classified in action-reaction (when bodies, which are in contact, change their momenta [25]) and action-at-a-distance forces (when objects interact without being physically touched). Considering that social interactions can be often modeled as contact-less, it becomes possible to explain social phenomena in a certain environment by modeling interactions between entities with action-at-a-distance forces. A force field null F is defined as a vector point-function which has the property that at every point of the space takes a particular value related to the magnitude and direction of a force acting on a particle of unit of mass placed there [26]. Accordingly, in this work, the particles of unit of mass affected by force fields will be called agents. A central force field null F f ( r)หr is a special case of force field in which the motion of agents is affected depending on the distance r to a center of force, which is generally associated with the center of mass of the object that produces the force field.
Theory of Optimal Bayesian Feature Filtering
pour, Ali Foroughi, Dalton, Lori A.
Optimal Bayesian feature filtering (OBF) is a supervised screening method designed for biomarker discovery. In this article, we prove two major theoretical properties of OBF. First, optimal Bayesian feature selection under a general family of Bayesian models reduces to filtering if and only if the underlying Bayesian model assumes all features are mutually independent. Therefore, OBF is optimal if and only if one assumes all features are mutually independent, and OBF is the only filter method that is optimal under at least one model in the general Bayesian framework. Second, OBF under independent Gaussian models is consistent under very mild conditions, including cases where the data is non-Gaussian with correlated features. This result provides conditions where OBF is guaranteed to identify the correct feature set given enough data, and it justifies the use of OBF in non-design settings where its assumptions are invalid.
Non-Bayesian Social Learning with Uncertain Models
Hare, James Z., Uribe, Cesar A., Kaplan, Lance, Jadbabaie, Ali
Non-Bayesian social learning theory provides a framework that models distributed inference for a group of agents interacting over a social network. In this framework, each agent iteratively forms and communicates beliefs about an unknown state of the world with their neighbors using a learning rule. Existing approaches assume agents have access to precise statistical models (in the form of likelihoods) for the state of the world. However in many situations, such models must be learned from finite data. We propose a social learning rule that takes into account uncertainty in the statistical models using second-order probabilities. Therefore, beliefs derived from uncertain models are sensitive to the amount of past evidence collected for each hypothesis. We characterize how well the hypotheses can be tested on a social network, as consistent or not with the state of the world. We explicitly show the dependency of the generated beliefs with respect to the amount of prior evidence. Moreover, as the amount of prior evidence goes to infinity, learning occurs and is consistent with traditional social learning theory.
Addressing Design Issues in Medical Expert System for Low Back Pain Management: Knowledge Representation, Inference Mechanism, and Conflict Resolution Using Bayesian Network
Santra, Debarpita, Mandal, Jyotsna Kumar, Basu, Swapan Kumar, Goswami, Subrata
Aiming at developing a medical expert system for low back pain management, the paper proposes an efficient knowledge representation scheme using frame data structures, and also derives a reliable resolution logic through Bayesian Network. When a patient comes to the intended expert system for diagnosis, the proposed inference engine outputs a number of probable diseases in sorted order, with each disease being associated with a numeric measure to indicate its possibility of occurrence. When two or more diseases in the list have the same or closer possibility of occurrence, Bayesian Network is used for conflict resolution. The proposed scheme has been validated with cases of empirically selected thirty patients. Considering the expected value 0.75 as level of acceptance, the proposed system offers the diagnostic inference with the standard deviation of 0.029. The computational value of Chi-Squared test has been obtained as 11.08 with 12 degree of freedom, implying that the derived results from the designed system conform the homogeneity with the expected outcomes. Prior to any clinical investigations on the selected low back pain patients, the accuracy level (average) of 73.89% has been achieved by the proposed system, which is quite close to the expected clinical accuracy level of 75%.
Order-free Learning Alleviating Exposure Bias in Multi-label Classification
Multi-label classification (MLC) assigns multiple labels to each sample. Prior studies show that MLC can be transformed to a sequence prediction problem with a recurrent neural network (RNN) decoder to model the label dependency. However, training a RNN decoder requires a predefined order of labels, which is not directly available in the MLC specification. Besides, RNN thus trained tends to overfit the label combinations in the training set and have difficulty generating unseen label sequences. In this paper, we propose a new framework for MLC which does not rely on a predefined label order and thus alleviates exposure bias. The experimental results on three multi-label classification benchmark datasets show that our method outperforms competitive baselines by a large margin. We also find the proposed approach has a higher probability of generating label combinations not seen during training than the baseline models. The result shows that the proposed approach has better generalization capability.