Uncertainty
SlimShot: Probabilistic Inference for Web-Scale Knowledge Bases
Gribkoff, Eric (University of Washington) | Suciu, Dan (University of Washington)
Increasingly large Knowledge Bases are being created, by crawling the Web or other corpora of documents, and by extracting facts and relations using machine learning techniques. To manage the uncertainty in the data, these KBs rely on probabilistic engines based on Markov Logic Networks (MLN), for which probabilistic inference remains a major challenge. Today's state of the art systems reduce the task of inference to weighted model counting and use an MCMC algorithm wrapped around SampleSAT to generate approximately uniform samples. This approach offers no theoretical error guarantees and, as we show, suffers from poor performance in practice. In this paper we describe SlimShot (Scalable Lifted Inference and Monte Carlo Sampling Hybrid Optimization Technique), a probabilistic inference engine for Web-Scale knowledge bases. SlimShot converts the MLN to a tuple-independent probabilistic database, then uses a simple Monte Carlo-based inference, with three key enhancements: (1) it combines sampling with safe query evaluation, (2) it estimates a conditional probability by jointly computing the numerator and denominator, and (3) it adjusts the proposal distribution based on the sample cardinality. In combination, these three techniques allow us to give formal error guarantees, and we demonstrate empirically that SlimShot outperforms today's state of the art probabilistic inference engines used in knowledge bases.
Satisfiability and Model Counting in Open Universes
SAT and #SAT are at the heart of many important problem formulations in AI, the most prominent being reasoning and learning in first-order and probabilistic knowledge bases. In practice, all contemporary systems resort to domain closure: objects in the universe are all and only the ones mentioned in the knowledge base. This is in stark contrast to the natural ability of human beings to infer things about sensory inputs and unforeseen data: they infer the existence of objects from their observations; no predefined list of objects is given or known in advance. In this paper, we introduce the formal foundations for reasoning in open universes in a general way, purely based on SAT and #SAT technology.
Assessing forensic evidence by computing belief functions
Kerkvliet, Timber, Meester, Ronald
We first discuss certain problems with the classical probabilistic approach for assessing forensic evidence, in particular its inability to distinguish between lack of belief and disbelief, and its inability to model complete ignorance within a given population. We then discuss Shafer belief functions, a generalization of probability distributions, which can deal with both these objections. We use a calculus of belief functions which does not use the much criticized Dempster rule of combination, but only the very natural Dempster-Shafer conditioning. We then apply this calculus to some classical forensic problems like the various island problems and the problem of parental identification. If we impose no prior knowledge apart from assuming that the culprit or parent belongs to a given population (something which is possible in our setting), then our answers differ from the classical ones when uniform or other priors are imposed. We can actually retrieve the classical answers by imposing the relevant priors, so our setup can and should be interpreted as a generalization of the classical methodology, allowing more flexibility. We show how our calculus can be used to develop an analogue of Bayes' rule, with belief functions instead of classical probabilities. We also discuss consequences of our theory for legal practice.
Stochastic And-Or Grammars: A Unified Framework and Logic Perspective
Formal grammars are a popular class of knowledge representation that is traditionally confined to the modeling of natural and computer languages. However, several extensions of grammars have been proposed over time to model other types of data such as images [1, 2, 3] and events [4, 5, 6]. One prominent type of extension is stochastic And-Or grammars (AOG) [2]. A stochastic AOG simultaneously models compositions (i.e., a large pattern is the composition of several small patterns arranged according to a certain configuration) and reconfigurations (i.e., a pattern may have several alternative configurations), and in this way it can compactly represent a probabilistic distribution over a large number of patterns. Stochastic AOGs can be used to parse data samples into their compositional structures, which help solve multiple tasks (such as classification, annotation, and segmentation of the data samples) in a unified manner. This work was supported by the National Natural Science Foundation of China (61503248).
Interactive machine learning for health informatics: when do we need the human-in-the-loop? - Springer
Originally the term "machine learning" was defined as "... artificial generation of knowledge from experience," and the first studies have been performed with games, i.e., with the game of checkers [1]. Today, machine learning (ML) is the fastest growing technical field, at the intersection of informatics and statistics, tightly connected with data science and knowledge discovery, and health is among the greatest challenges [2, 3]. Particularly, probabilistic ML is extremely useful for health informatics, where most problems involve dealing with uncertainty. The theoretical basis for the probabilistic ML was laid by Thomas Bayes (1701โ1761), [4, 5]. Probabilistic inference vastly influenced artificial intelligence and statistical learning and the inverse probability allows to infer unknowns, learn from data and make predictions [6, 7].
Bayesian machine learning - FastML
So you know the Bayes rule. How does it relate to machine learning? It can be quite difficult to grasp how the puzzle pieces fit together - we know it took us a while. This article is an introduction we wish we had back then. While we have some grasp on the matter, we're not experts, so the following might contain inaccuracies or even outright errors. Feel free to point them out, either in the comments or privately.
Stability and Structural Properties of Gene Regulation Networks with Coregulation Rules
Warrell, Jonathan H., Mhlanga, Musa M.
Coregulation of the expression of groups of genes has been extensively demonstrated empirically in bacterial and eukaryotic systems. Such coregulation can arise through the use of shared regulatory motifs, which allow the coordinated expression of modules (and module groups) of functionally related genes across the genome. Coregulation can also arise through the physical association of multi-gene complexes through chromosomal looping, which are then transcribed together. We present a general formalism for modeling coregulation rules in the framework of Random Boolean Networks (RBN), and develop specific models for transcription factor networks with modular structure (including module groups, and multi-input modules (MIM) with autoregulation) and multi-gene complexes (including hierarchical differentiation between multi-gene complex members). We develop a mean-field approach to analyse the stability of large networks incorporating coregulation, and show that autoregulated MIM and hierarchical gene-complex models can achieve greater stability than networks without coregulation whose rules have matching activation frequency. We provide further analysis of the stability of small networks of both kinds through simulations. We also characterize several general properties of the transients and attractors in the hierarchical coregulation model, and show using simulations that the steady-state distribution factorizes hierarchically as a Bayesian network in a Markov Jump Process analogue of the RBN model.
Reading Ian Goodfellow's new deep learning book and can't figure out how to derive a conditional probability. Can someone help? โข /r/MachineLearning
Its a constant that you use to normalize, right? And what comes after the normalizing constant in the equation is a vector, right? The authors are using Z' so that you know that the vector always gets normalized, you don't just calculate a constant at the start of training and reuse the same constant each time you calculate as the vector moves off normal.
Fuzzy.io Wants to Democratize Artificial Intelligence For All Developers - The New Stack
While there may be millions of developers, there simply aren't enough data scientists to go around, and most of them are committed to working for large companies with big budgets and humongous data sets. Companies like Montreal-based Fuzzy.io are filling in the talent gap by offering an API to a set of artificial intelligence (AI) services that allows web and mobile developers to easily incorporate AI-based decision-making into their projects -- ranging from recommendations, to dynamic pricing decisions, and matching users in marketplaces. "Most of the existing ML development services are built to be used by data scientists or developers who have expertise in building AI/ML systems," said Fuzzy.io co-founder Matt Fogel. "Additionally, most of these tools require the developer to bring a great deal of data in order to train custom models. The company was founded by Fogel, who was the former produce vice president at Agendize, along with serial entrepreneur and developer Evan Prodromou. The company also recently added Kevin Fox, who, when he was at Google, helped create the user interfaces for Gmail and Google Calendar. These virtual intelligent machines use an adaptive rule base to translate pre-set, intuitive and vague "business rules" into a framework that can generate precise results. It could be as vague as "new", "old", "warm" and "good," as the company explains on its blog: "A fuzzy agent accepts some input variables and maps them onto fuzzy sets -- intuitive terms from the problem domain.