# Uncertainty

### Posterior Probability

In statistics, the posterior probability expresses how likely a hypothesis is given a particular set of data. This contrasts with the likelihood function, which is represented as P(D H). This distinction is more of an interpretation rather than a mathematical property as both have the form of conditional probability. In order to calculate the posterior probability, we use Bayes theorem, which is discussed below. Bayes theorem, which is the probability of a hypothesis given some prior observable data, relies on the use of likelihood P(D H) alongside the prior P(H) and marginal likelihood P(D) in order to calculate the posterior P(H D).

### Bayesian models in R

If there was something that always frustrated me was not fully understanding Bayesian inference. Sometime last year, I came across an article about a TensorFlow-supported R package for Bayesian analysis, called greta. Back then, I searched for greta tutorials and stumbled on this blog post that praised a textbook called Statistical Rethinking: A Bayesian Course with Examples in R and Stan by Richard McElreath. I had found a solution to my lingering frustration so I bought a copy straight away. I spent the last few months reading it cover to cover and solving the proposed exercises, which are heavily based on the rethinking package. I cannot recommend it highly enough to whoever seeks a solid grip on Bayesian statistics, both in theory and application. This post ought to be my most gratifying blogging experience so far, in that I am essentially reporting my own recent learning. I am convinced this will make the storytelling all the more effective. As a demonstration, the female cuckoo reproductive output data recently analysed by Riehl et al., 2019 [1] will be modelled using In the process, we will conduct the MCMC sampling, visualise posterior distributions, generate predictions and ultimately assess the influence of social parasitism in female reproductive output. You should have some familiarity with standard statistical models.

### A knowledge-based intelligence system for control of dirt recognition process in the smart washing machines

In this paper, we propose an intelligence approach based on fuzzy logic to modeling human intelligence in washing clothes. At first, an intelligent feedback loop is designed for perception-based sensing of dirt inspired by human color understanding. Then, when color stains leak out of some colored clothes the human probabilistic decision making is computationally modeled to detect this stain leakage and thus the problem of recognizing dirt from stain can be considered in the washing process. Finally, we discuss the fuzzy control of washing clothes and design and simulate a smart controller based on the fuzzy intelligence feedback loop.

### Model Comparison for Semantic Grouping

We introduce a probabilistic framework for quantifying the semantic similarity between two groups of embeddings. We formulate the task of semantic similarity as a model comparison task in which we contrast a generative model which jointly models two sentences versus one that does not. We illustrate how this framework can be used for the Semantic Textual Similarity tasks using clear assumptions about how the embeddings of words are generated. We apply model comparison that utilises information criteria to address some of the shortcomings of Bayesian model comparison, whilst still penalising model complexity. We achieve competitive results by applying the proposed framework with an appropriate choice of likelihood on the STS datasets.

### Ensemble Distribution Distillation

Ensemble of Neural Network (NN) models are known to yield improvements in accuracy. Furthermore, they have been empirically shown to yield robust measures of uncertainty, though without theoretical guarantees. However, ensembles come at high computational and memory cost, which may be prohibitive for certain application. There has been significant work done on the distillation of an ensemble into a single model. Such approaches decrease computational cost and allow a single model to achieve accuracy comparable to that of an ensemble. However, information about the \emph{diversity} of the ensemble, which can yield estimates of \emph{knowledge uncertainty}, is lost. Recently, a new class of models, called Prior Networks, has been proposed, which allows a single neural network to explicitly model a distribution over output distributions, effectively emulating an ensemble. In this work ensembles and Prior Networks are combined to yield a novel approach called \emph{Ensemble Distribution Distillation} (EnD$^2$), which allows distilling an ensemble into a single Prior Network. This allows a single model to retain both the improved classification performance as well as measures of diversity of the ensemble. In this initial investigation the properties of EnD$^2$ have been investigated and confirmed on an artificial dataset.

### Predictive Situation Awareness for Ebola Virus Disease using a Collective Intelligence Multi-Model Integration Platform: Bayes Cloud

The humanity has been facing a plethora of challenges associated with infectious diseases, which kill more than 6 million people a year. Although continuous efforts have been applied to relieve the potential damages from such misfortunate events, it is unquestionable that there are many persisting challenges yet to overcome. One related issue we particularly address here is the assessment and prediction of such epidemics. In this field of study, traditional and ad-hoc models frequently fail to provide proper predictive situation awareness (PSAW), characterized by understanding the current situations and predicting the future situations. Comprehensive PSAW for infectious disease can support decision making and help to hinder disease spread. In this paper, we develop a computing system platform focusing on collective intelligence causal modeling, in order to support PSAW in the domain of infectious disease. Analyses of global epidemics require integration of multiple different data and models, which can be originated from multiple independent researchers. These models should be integrated to accurately assess and predict the infectious disease in terms of holistic view. The system shall provide three main functions: (1) collaborative causal modeling, (2) causal model integration, and (3) causal model reasoning. These functions are supported by subject-matter expert and artificial intelligence (AI), with uncertainty treatment. Subject-matter experts, as collective intelligence, develop causal models and integrate them as one joint causal model. The integrated causal model shall be used to reason about: (1) the past, regarding how the causal factors have occurred; (2) the present, regarding how the spread is going now; and (3) the future, regarding how it will proceed. Finally, we introduce one use case of predictive situation awareness for the Ebola virus disease.

### Encoding Categorical Variables with Conjugate Bayesian Models for WeWork Lead Scoring Engine

Applied Data Scientists throughout various industries are commonly faced with the challenging task of encoding high-cardinality categorical features into digestible inputs for machine learning algorithms. This paper describes a Bayesian encoding technique developed for WeWork's lead scoring engine which outputs the probability of a person touring one of our office spaces based on interaction, enrichment, and geospatial data. We present a paradigm for ensemble modeling which mitigates the need to build complicated preprocessing and encoding schemes for categorical variables. In particular, domain-specific conjugate Bayesian models are employed as base learners for features in a stacked ensemble model. For each column of a categorical feature matrix we fit a problem-specific prior distribution, for example, the Beta distribution for a binary classification problem. In order to analytically derive the moments of the posterior distribution, we update the prior with the conjugate likelihood of the corresponding target variable for each unique value of the given categorical feature. This function of column and value encodes the categorical feature matrix so that the final learner in the ensemble model ingests low-dimensional numerical input. Experimental results on both curated and real world datasets demonstrate impressive accuracy and computational efficiency on a variety of problem archetypes. Particularly, for the lead scoring engine at WeWork -- where some categorical features have as many as 300,000 levels -- we have seen an AUC improvement from 0.87 to 0.97 through implementing conjugate Bayesian model encoding.

### Neuromorphic Acceleration for Approximate Bayesian Inference on Neural Networks via Permanent Dropout

As neural networks have begun performing increasingly critical tasks for society, ranging from driving cars to identifying candidates for drug development, the value of their ability to perform uncertainty quantification (UQ) in their predictions has risen commensurately. Permanent dropout, a popular method for neural network UQ, involves injecting stochasticity into the inference phase of the model and creating many predictions for each of the test data. This shifts the computational and energy burden of deep neural networks from the training phase to the inference phase. Recent work has demonstrated near-lossless conversion of classical deep neural networks to their spiking counterparts. We use these results to demonstrate the feasibility of conducting the inference phase with permanent dropout on spiking neural networks, mitigating the technique's computational and energy burden, which is essential for its use at scale or on edge platforms. We demonstrate the proposed approach via the Nengo spiking neural simulator on a combination drug therapy dataset for cancer treatment, where UQ is critical. Our results indicate that the spiking approximation gives a predictive distribution practically indistinguishable from that given by the classical network.

### Deep pNML: Predictive Normalized Maximum Likelihood for Deep Neural Networks

The Predictive Normalized Maximum Likelihood (pNML) scheme has been recently suggested for universal learning in the individual setting, where both the training and test samples are individual data. The goal of universal learning is to compete with a genie'' or reference learner that knows the data values, but is restricted to use a learner from a given model class. The pNML minimizes the associated regret for any possible value of the unknown label. Furthermore, its min-max regret can serve as a pointwise measure of learnability for the specific training and data sample. In this work we examine the pNML and its associated learnability measure for the Deep Neural Network (DNN) model class. As shown, the pNML outperforms the commonly used Empirical Risk Minimization (ERM) approach and provides robustness against adversarial attacks. Together with its learnability measure it can detect out of distribution test examples, be tolerant to noisy labels and serve as a confidence measure for the ERM. Finally, we extend the pNML to a twice universal'' solution, that provides universality for model class selection and generates a learner competing with the best one from all model classes.

### Fuzzy Rule Interpolation Methods and Fri Toolbox

FRI methods are less popular in the practical application domain. One possible reason is the missing common framework. There are many FRI methods developed independently, having different interpolation concepts and features. One trial for setting up a common FRI framework was the MATLAB FRI Toolbox, developed by Johany\'ak et. al. in 2006. The goals of this paper are divided as follows: firstly, to present a brief introduction of the FRI methods. Secondly, to introduce a brief description of the refreshed and extended version of the original FRI Toolbox. And thirdly, to use different unified numerical benchmark examples to evaluate and analyze the Fuzzy Rule Interpolation Techniques (FRI) (KH, KH Stabilized, MACI, IMUL, CRF, VKK, GM, FRIPOC, LESFRI, and SCALEMOVE), that will be classified and compared based on different features by following the abnormality and linearity conditions [15].