Bayesian Learning
Computational Cognitive Science lab: Reading list on Bayesian methods
This list is intended to introduce some of the tools of Bayesian statistics and machine learning that can be useful to computational research in cognitive science. The first section mentions several useful general references, and the others provide supplementary readings on specific topics. If you would like to suggest some additions to the list, contact Tom Griffiths.
The decoupled extended Kalman filter for dynamic exponential-family factorization models
Gomez-Uribe, Carlos Alberto, Karrer, Brian
We specialize the decoupled extended Kalman filter (DEKF) for online parameter learning in factorization models, including factorization machines, matrix and tensor factorization, and illustrate the effectiveness of the approach through simulations. Learning model parameters through the DEKF makes factorization models more broadly useful by allowing for more flexible observations through the entire exponential family, modeling parameter drift, and producing parameter uncertainty estimates that can enable explore/exploit and other applications. We use a more general dynamics of the parameters than the standard DEKF, allowing parameter drift while encouraging reasonable values. We also present an alternate derivation of the regular extended Kalman filter and DEKF that connects these methods to natural gradient methods, and suggests a similarly decoupled version of the iterated extended Kalman filter.
Semantically Enhanced Dynamic Bayesian Network for Detecting Sepsis Mortality Risk in ICU Patients with Infection
Wang, Tony, Velez, Tom, Apostolova, Emilia, Tschampel, Tim, Ngo, Thuy L., Hardison, Joy
Although timely sepsis diagnosis and prompt interventions in Intensive Care Unit (ICU) patients are associated with reduced mortality, early clinical recognition is frequently impeded by nonspecific signs of infection and failure to detect signs of sepsis-induced organ dysfunction in a constellation of dynamically changing physiological data. The goal of this work is to identify patient at risk of life-threatening sepsis utilizing a data-centered and machine learning-driven approach. We derive a mortality risk predictive dynamic Bayesian network (DBN) guided by a customized sepsis knowledgebase and compare the predictive accuracy of the derived DBN with the Sepsis-related Organ Failure Assessment (SOFA) score, the Quick SOFA (qSOFA) score, the Simplified Acute Physiological Score (SAPS-II) and the Modified Early Warning Score (MEWS) tools. A customized sepsis ontology was used to derive the DBN node structure and semantically characterize temporal features derived from both structured physiological data and unstructured clinical notes. We assessed the performance in predicting mortality risk of the DBN predictive model and compared performance to other models using Receiver Operating Characteristic (ROC) curves, area under curve (AUROC), calibration curves, and risk distributions. The derived dataset consists of 24,506 ICU stays from 19,623 patients with evidence of suspected infection, with 2,829 patients deceased at discharge. The DBN AUROC was found to be 0.91, which outperformed the SOFA (0.843), qSOFA (0.66), MEWS (0.729), and SAPS-II (0.766) scoring tools. Continuous Net Reclassification Index and Integrated Discrimination Improvement analysis supported the superiority DBN with respect to SOFA, qSOFA, MEWS, and SAPS-II. Compared with conventional rule-based risk scoring tools, the sepsis knowledgebase-driven DBN algorithm offers improved performance for predicting mortality of infected patients in intensive care units.
Bayesian methods for low-rank matrix estimation: short survey and theoretical study
The problem of low-rank matrix estimation recently received a lot of attention due to challenging applications. A lot of work has been done on rank-penalized methods and convex relaxation, both on the theoretical and applied sides. However, only a few papers considered Bayesian estimation. In this paper, we review the different type of priors considered on matrices to favour low-rank. We also prove that the obtained Bayesian estimators, under suitable assumptions, enjoys the same optimality properties as the ones based on penalization.
Asymptotic Properties of Recursive Maximum Likelihood Estimation in Non-Linear State-Space Models
Tadic, Vladislav Z. B., Doucet, Arnaud
Using stochastic gradient search and the optimal filter derivative, it is possible to perform recursive (i.e., online) maximum likelihood estimation in a non-linear state-space model. As the optimal filter and its derivative are analytically intractable for such a model, they need to be approximated numerically. In [Poyiadjis, Doucet and Singh, Biometrika 2018], a recursive maximum likelihood algorithm based on a particle approximation to the optimal filter derivative has been proposed and studied through numerical simulations. Here, this algorithm and its asymptotic behavior are analyzed theoretically. We show that the algorithm accurately estimates maxima to the underlying (average) log-likelihood when the number of particles is sufficiently large. We also derive (relatively) tight bounds on the estimation error. The obtained results hold under (relatively) mild conditions and cover several classes of non-linear state-space models met in practice.
Why Interpretability in Machine Learning? An Answer Using Distributed Detection and Data Fusion Theory
Varshney, Kush R., Khanduri, Prashant, Sharma, Pranay, Zhang, Shan, Varshney, Pramod K.
As artificial intelligence is increasingly affecting all parts of society and life, there is growing recognition that human interpretability of machine learning models is important. It is often argued that accuracy or other similar generalization performance metrics must be sacrificed in order to gain interpretability. Such arguments, however, fail to acknowledge that the overall decision-making system is composed of two entities: the learned model and a human who fuses together model outputs with his or her own information. As such, the relevant performance criteria should be for the entire system, not just for the machine learning component. In this work, we characterize the performance of such two-node tandem data fusion systems using the theory of distributed detection. In doing so, we work in the population setting and model interpretable learned models as multi-level quantizers. We prove that under our abstraction, the overall system of a human with an interpretable classifier outperforms one with a black box classifier.
Learning dynamical systems with particle stochastic approximation EM
Svensson, Andreas, Lindsten, Fredrik
Learning of dynamical systems, or state-space models, is central to many machine learning problems, such as reinforcement learning, sequence modeling, and autonomous systems. Furthermore, state-space models are at the core of recent model developments within the machine learning area, such as Gaussian process state-space models (Frigola et al. 2014a; Mattos et al. 2016; etc.), infinite factorial dynamical models (Gael et al., 2009; Valera et al., 2015), and stochastic recurrent neural networks (Fraccaro et al., 2016, for example). A strategy to learn state-space models, independently suggested by Digalakis et al. (1993) and Ghahramani and Hinton (1996), is the use of the Expectation Maximization (EM, Dempster et al. 1977) method. Even though originally proposed only for maximum likelihood estimation of linear models with Gaussian noise, the strategy can be generalized to the more challenging nonlinear and non-Gaussian cases, as well as the empirical Bayes setting. Many contributions have been made during the last decade, and this paper takes another step along the path towards a more computationally efficient method with a solid theoretical ground for learning of nonlinear dynamical systems.
Fundamental limits of detection in the spiked Wigner model
Alaoui, Ahmed El, Krzakala, Florent, Jordan, Michael I.
We study the fundamental limits of detecting the presence of an additive rank-one perturbation, or spike, to a Wigner matrix. When the spike comes from a prior that is i.i.d. across coordinates, we prove that the log-likelihood ratio of the spiked model against the non-spiked one is asymptotically normal below a certain reconstruction threshold which is not necessarily of a "spectral" nature, and that it is degenerate above. This establishes the maximal region of contiguity between the planted and null models. It is known that this threshold also marks a phase transition for estimating the spike: the latter task is possible above the threshold and impossible below. Therefore, both estimation and detection undergo the same transition in this random matrix model. We also provide further information about the performance of the optimal test. Our proofs are based on Gaussian interpolation methods and a rigorous incarnation of the cavity method, as devised by Guerra and Talagrand in their study of the Sherrington--Kirkpatrick spin-glass model.
Accelerating likelihood optimization for ICA on real signals
Ablin, Pierre, Cardoso, Jean-François, Gramfort, Alexandre
We study optimization methods for solving the maximum likelihood formulation of independent component analysis (ICA). We consider both the the problem constrained to white signals and the unconstrained problem. The Hessian of the objective function is costly to compute, which renders Newton's method impractical for large data sets. Many algorithms proposed in the literature can be rewritten as quasi-Newton methods, for which the Hessian approximation is cheap to compute. These algorithms are very fast on simulated data where the linear mixture assumption really holds. However, on real signals, we observe that their rate of convergence can be severely impaired. In this paper, we investigate the origins of this behavior, and show that the recently proposed Preconditioned ICA for Real Data (Picard) algorithm overcomes this issue on both constrained and unconstrained problems.
Probabilistic Inference Using Generators - The Statues Algorithm
We present here a new probabilistic inference algorithm that gives exact results in the domain of discrete probability distributions. This algorithm, named the Statues algorithm, calculates the marginal probability distribution on probabilistic models defined as direct acyclic graphs. These models are made up of well-defined primitives that allow to express, in particular, joint probability distributions, Bayesian networks, discrete Markov chains, conditioning and probabilistic arithmetic. The Statues algorithm relies on a variable binding mechanism based on the generator construct, a special form of coroutine; being related to the enumeration algorithm, this new algorithm brings important improvements in terms of efficiency, which makes it valuable in regard to other exact marginalization algorithms. After introduction of several definitions, primitives and compositional rules, we present in details the Statues algorithm. Then, we briefly discuss the interest of this algorithm compared to others and we present possible extensions. Finally, we introduce Lea and MicroLea, two Python libraries implementing the Statues algorithm, along with several use cases.