bayesian analysis
A Bayesian Analysis of Dynamics in Free Recall
We develop a probabilistic model of human memory performance in free recall experiments. In these experiments, a subject first studies a list of words and then tries to recall them. To model these data, we draw on both previous psychological research and statistical topic models of text documents. We assume that memories are formed by assimilating the semantic meaning of studied words (represented as a distribution over topics) into a slowly changing latent context (represented in the same space). During recall, this context is reinstated and used as a cue for retrieving studied words.
Analysing the SEDs of protoplanetary disks with machine learning
Kaeufer, T., Woitke, P., Min, M., Kamp, I., Pinte, C.
ABRIDGED. The analysis of spectral energy distributions (SEDs) of protoplanetary disks to determine their physical properties is known to be highly degenerate. Hence, a Bayesian analysis is required to obtain parameter uncertainties and degeneracies. The challenge here is computational speed, as one radiative transfer model requires a couple of minutes to compute. We performed a Bayesian analysis for 30 well-known protoplanetary disks to determine their physical disk properties, including uncertainties and degeneracies. To circumvent the computational cost problem, we created neural networks (NNs) to emulate the SED generation process. We created two sets of radiative transfer disk models to train and test two NNs that predict SEDs for continuous and discontinuous disks. A Bayesian analysis was then performed on 30 protoplanetary disks with SED data collected by the DIANA project to determine the posterior distributions of all parameters. We ran this analysis twice, (i) with old distances and additional parameter constraints as used in a previous study, to compare results, and (ii) with updated distances and free choice of parameters to obtain homogeneous and unbiased model parameters. We evaluated the uncertainties in the determination of physical disk parameters from SED analysis, and detected and quantified the strongest degeneracies. The NNs are able to predict SEDs within 1ms with uncertainties of about 5% compared to the true SEDs obtained by the radiative transfer code. We find parameter values and uncertainties that are significantly different from previous values obtained by $\chi^2$ fitting. Comparing the global evidence for continuous and discontinuous disks, we find that 26 out of 30 objects are better described by disks that have two distinct radial zones. Also, we created an interactive tool that instantly returns the SED predicted by our NNs for any parameter combination.
Confusion Matrices and Accuracy Statistics for Binary Classifiers Using Unlabeled Data: The Diagnostic Test Approach
Sometimes it is important to know the accuracy of a classifier on unlabeled data. The labels may be delayed, as in consumer purchasing predictions, or obtaining the labels is cost prohibitive. The labels may not exist, as for some medical conditions, for which the true gold standard diagnostic test(a 100% sensitive and 100% specific classifier) would require subjects be euthanized and autopsied to obtain labels. Epidemiologists and biostatisticians have developed statistical methods for assessing the sensitivity (Se) and specificity (Sp) of diagnostic tests when gold standard comparison tests are unavailable. In data science terms, the diagnostic test assessment data are unlabeled. In this article, I describe how to modify those diagnostic test statistical methods to estimate confusion matrices and accuracy statistics for binary classifiers.
Fishing: The Bayesian Way of Analyzing Zero-inflated Data
Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. In past posts, I have shown several ways to apply Bayesian analysis for mostly normally distributed data.
Paper: Bayesian statistics and modelling
Bayesian statistics and modelling is an open access paper published by Nature Reviews as part of its first volume of Methods Primers. Bayesian statistics is an approach to data analysis based on Bayes' theorem, where available knowledge about parameters in a statistical model is updated with the information in observed data. The background knowledge is expressed as a prior distribution and combined with observational data in the form of a likelihood function to determine the posterior distribution. The posterior can also be used for making predictions about future events. This Primer paper describes the stages involved in Bayesian analysis, from specifying the prior and data models to deriving inference, model checking and refinement.
Fully Bayesian Analysis of the Relevance Vector Machine Classification for Imbalanced Data
Wang, Wenyang, Sun, Dongchu, He, Zhuoqiong
Relevance Vector Machine (RVM) is a supervised learning algorithm extended from Support Vector Machine (SVM) based on the Bayesian sparsity model. Compared with the regression problem, RVM classification is difficult to be conducted because there is no closed-form solution for the weight parameter posterior. Original RVM classification algorithm used Newton's method in optimization to obtain the mode of weight parameter posterior then approximated it by a Gaussian distribution in Laplace's method. It would work but just applied the frequency methods in a Bayesian framework. This paper proposes a Generic Bayesian approach for the RVM classification. We conjecture that our algorithm achieves convergent estimates of the quantities of interest compared with the nonconvergent estimates of the original RVM classification algorithm. Furthermore, a Fully Bayesian approach with the hierarchical hyperprior structure for RVM classification is proposed, which improves the classification performance, especially in the imbalanced data problem. By the numeric studies, our proposed algorithms obtain high classification accuracy rates. The Fully Bayesian hierarchical hyperprior method outperforms the Generic one for the imbalanced data classification.
A Bayesian Analysis of Dynamics in Free Recall
Socher, Richard, Gershman, Samuel, Sederberg, Per, Norman, Kenneth, Perotte, Adler J., Blei, David M.
We develop a probabilistic model of human memory performance in free recall experiments. In these experiments, a subject first studies a list of words and then tries to recall them. To model these data, we draw on both previous psychological research and statistical topic models of text documents. We assume that memories are formed by assimilating the semantic meaning of studied words (represented as a distribution over topics) into a slowly changing latent context (represented in the same space). During recall, this context is reinstated and used as a cue for retrieving studied words.
Bayesian Analysis with Python – Second Edition
Bayesian Analysis with Python – Second Edition is a step-by-step guide to conduct Bayesian data analyses using PyMC3 and ArviZ. Description The second edition of Bayesian Analysis with Python is an introduction to the main concepts of applied Bayesian inference and its practical implementation in Python using PyMC3, a state-of-the-art probabilistic programming library, and ArviZ, a new library for exploratory analysis of Bayesian models.
Leveraging Bayesian Analysis To Improve Accuracy of Approximate Models
Nadiga, Balasubramanya T., Jiang, Chiyu, Livescu, Daniel
We focus on improving the accuracy of an approximate model of a multiscale dynamical system that uses a set of parameter-dependent terms to account for the effects of unresolved or neglected dynamics on resolved scales. We start by considering various methods of calibrating and analyzing such a model given a few well-resolved simulations. After presenting results for various point estimates and discussing some of their shortcomings, we demonstrate (a) the potential of hierarchical Bayesian analysis to uncover previously unanticipated physical dependencies in the approximate model, and (b) how such insights can then be used to improve the model. In effect parametric dependencies found from the Bayesian analysis are used to improve structural aspects of the model. While we choose to illustrate the procedure in the context of a closure model for buoyancy-driven, variable-density turbulence, the statistical nature of the approach makes it more generally applicable. Towards addressing issues of increased computational cost associated with the procedure, we demonstrate the use of a neural network based surrogate in accelerating the posterior sampling process and point to recent developments in variational inference as an alternative methodology for greatly mitigating such costs. We conclude by suggesting that modern validation and uncertainty quantification techniques such as the ones we consider have a valuable role to play in the development and improvement of approximate models.