Bayesian Inference
Mixture-based Multiple Imputation Model for Clinical Data with a Temporal Dimension
Xue, Ye, Klabjan, Diego, Luo, Yuan
--The problem of missing values in multivariable time series is a key challenge in many applications such as clinical data mining. Although many imputation methods show their effectiveness in many applications, few of them are designed to accommodate clinical multivariable time series. In this work, we propose a multiple imputation model that capture both cross-sectional information and temporal correlations. We integrate Gaussian processes with mixture models and introduce individualized mixing weights to handle the variance of predictive confidence of Gaussian process models. The proposed model is compared with several state-of-the-art imputation algorithms on both real-world and synthetic datasets. Experiments show that our best model can provide more accurate imputation than the benchmarks on all of our datasets. I NTRODUCTION The computational modeling in clinical applications attracts growing interest with the realization that the quantitative understanding of patient pathophysiological progression is crucial to clinical studies [1]. With a comprehensive and precise modeling, we can have a better understanding of a patient's state, offer more precise diagnosis and provide better individualized therapies [2]. Researchers are increasingly motivated to build more accurate computational models from multiple types of clinical data. However, missing values in clinical data challenge researchers using analytic techniques for modeling, as many of the techniques are designed for complete data. Traditional strategies used in clinical studies to handle missing values include deleting records with missing values and imputing missing entries by mean values. However, deleting records with missing values and some other filtering strategies can introduce biases [3] that can impact modeling in many ways, thus limiting its generalizability. Mean imputation is widely used by researchers to handle missing values. However, it is shown to yield less effective estimates than many other modern imputation techniques [4]-[7], such as maximum likelihood approaches and multiple imputation methods (e.g.
A Noise-Robust Fast Sparse Bayesian Learning Model
Helgรธy, Ingvild M., Li, Yushu
This paper utilizes the hierarchical model structure from the Bayesian Lasso in the Sparse Bayesian Learning process to develop a new type of probabilistic supervised learning approach. This approach has several performance advantages, such as being fast, sparse and especially robust to the variance in random noise. The hierarchical model structure in this Bayesian framework is designed in such a way that the priors do not only penalize the unnecessary complexity of the model but also depend on the variance of the random noise in the data. The hyperparameters in the model are estimated by the Fast Marginal Likelihood Maximization algorithm and can achieve low computational cost and faster learning process. We compare our methodology with two other popular Sparse Bayesian Learning models: The Relevance Vector Machine and a sparse Bayesian model that has been used for signal reconstruction in compressive sensing. We show that our method will generally provide more sparse solutions and be more flexible and stable when data is polluted by high variance noise.
Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective
Can an arbitrarily intelligent reinforcement learning agent be kept under control by a human user? Or do agents with sufficient intelligence inevitably find ways to shortcut their reward signal? This question impacts how far reinforcement learning can be scaled, and whether alternative paradigms must be developed in order to build safe artificial general intelligence. In this paper, we use an intuitive yet precise graphical model called causal influence diagrams to formalize reward tampering problems. We also describe a number of modifications to the reinforcement learning objective that prevent incentives for reward tampering. We verify the solutions using recently developed graphical criteria for inferring agent incentives from causal influence diagrams. Along the way, we also compare corrigibility and self-preservation properties of the various solutions, and discuss how they can be combined into a single agent without reward tampering incentives.
Quantum Expectation-Maximization for Gaussian Mixture Models
Kerenidis, Iordanis, Luongo, Alessandro, Prakash, Anupam
The Expectation-Maximization (EM) algorithm is a fundamental tool in unsupervised machine learning. It is often used as an efficient way to solve Maximum Likelihood (ML) estimation problems, especially for models with latent variables. It is also the algorithm of choice to fit mixture models: generative models that represent unlabelled points originating from $k$ different processes, as samples from $k$ multivariate distributions. In this work we define and use a quantum version of EM to fit a Gaussian Mixture Model. Given quantum access to a dataset of $n$ vectors of dimension $d$, our algorithm has convergence and precision guarantees similar to the classical algorithm, but the runtime is only polylogarithmic in the number of elements in the training set, and is polynomial in other parameters - as the dimension of the feature space, and the number of components in the mixture. We generalize further the algorithm in two directions. First, we show how to fit any mixture model of probability distributions in the exponential family. Then, we show how to use this algorithm to compute the Maximum a Posteriori (MAP) estimate of a mixture model: the Bayesian approach to likelihood estimation problems. We discuss the performance of the algorithm on datasets that are expected to be classified successfully by those algorithms, arguing that on those cases we can give strong guarantees on the runtime.
Consistent Community Detection in Continuous-Time Networks of Relational Events
Arastuie, Makan, Paul, Subhadeep, Xu, Kevin S.
In many application settings involving networks, such as messages between users of an on-line social network or transactions between traders in financial markets, the observed data are in the form of relational events with timestamps, which form a continuous-time network. We propose the Community Hawkes Independent Pairs (CHIP) model for community detection on such timestamped relational event data. We demonstrate that applying spectral clustering to adjacency matrices constructed from relational events generated by the CHIP model provides consistent community detection for a growing number of nodes. In particular, we obtain explicit non-asymptotic upper bounds on the misclustering rates based on the separation conditions required on the parameters of the model for consistent community detection. We also develop consistent and computationally efficient estimators for the parameters of the model. We demonstrate that our proposed CHIP model and estimation procedure scales to large networks with tens of thousands of nodes and provides superior fits compared to existing continuous-time network models on several real networks.
Music Transcription Based on Bayesian Piece-Specific Score Models Capturing Repetitions
Nakamura, Eita, Yoshii, Kazuyoshi
YY, ZZZZ 1 Music Transcription Based on Bayesian Piece-Specific Score Models Capturing Repetitions Eita Nakamura, Kazuyoshi Y oshii, Member, IEEE Abstract --Most work on models for music transcription has focused on describing local sequential dependence of notes in musical scores and failed to capture their global repetitive structure, which can be a useful guide for transcribing music. Focusing on the rhythm, we formulate several classes of Bayesian Markov models of musical scores that describe repetitions indirectly by sparse transition probabilities of notes or note patterns. This enables us to construct piece-specific models for unseen scores with unfixed repetitive structure and to derive tractable inference algorithms. Moreover, to describe approximate repetitions, we explicitly incorporate a process of modifying the repeated notes/note patterns. We apply these models as a prior music language model for rhythm transcription, where piece-specific score models are inferred from performed MIDI data by unsupervised learning, in contrast to the conventional supervised construction of score models. Evaluations using vocal melodies of popular music showed that the Bayesian models improved the transcription accuracy for most of the tested model types, indicating the universal efficacy of the proposed approach. I NTRODUCTION Music transcription is an actively studied but yet unsolved problem in music information processing [1], [2]. One of the goals of music transcription is to convert a music performance signal into a human-readable symbolic musical score. While recent studies have achieved highly accurate pitch detection [3]-[7], it is also necessary to transcribe rhythms in order to obtain symbolic music representation [8]-[18]. Since there are many logically possible representations of rhythms (including meaningless one for humans) for a given performance [11], using a score model that describes prior knowledge about musical scores is a key to solve this problem. A common approach for music transcription is to integrate a musical score (language) model and a performance/acoustic model to obtain a proper transcription that best fits an input performance signal, similarly to the method of statistical speech recognition. More recently, end-to-end approaches have also been attempted [19]-[21], which have been of limited success so far. Manuscript received XX, YY; revised XX, YY . This work was supported partially by JSPS KAKENHI (Nos. The work of EN was supported by the JSPS research fellowship (PD).
Assessing the Safety and Reliability of Autonomous Vehicles from Road Testing
Zhao, Xingyu, Robu, Valentin, Flynn, David, Salako, Kizito, Strigini, Lorenzo
Although we have focused on the "hot" area of A Vs, our discussion and the novel CBI theorems are more generally applicable. We see them as especially useful now for MLbased systems with critical applications, although not with extreme requirements, since assurance in these systems must rely on combinations of statistical evidence with other verification methods that are, as yet, not well-established. A PPENDIX A. Statement And Proof of CBI Theorem 1 Problem: Consider the set D of all probability distributions defined over the unit interval, each distribution representing a potential prior distribution of pfm values for an A V . For 0 p l null null 1, we seek a prior distribution that minimises the posterior confidence in a reliability bound p [ p l, 1], given k fatalities have occurred over n miles driven and subject to constraints on some quantiles of the prior distribution. That is, for ฮธ (0, 1], we solve minimise D Pr ( X null p k & n) subject to Pr ( X null null) ฮธ, Pr (X null p l) 1 Solution: There is a prior in D that minimises the posterior confidence: the 2-point distribution Pr ( X x) ฮธ 1 x x 1 (1 ฮธ)1 x x 3 where p l null x 1 null null x 3, and the values of x 1 and x 3 both depend on the model parameters (i.e.
Prune Sampling: a MCMC inference technique for discrete and deterministic Bayesian networks
Phillipson, Frank, Parie, Jurriaan, Weikamp, Ron
We introduce and characterise the performance of the Markov chain Monte Carlo (MCMC) inference method Prune Sampling for discrete and deterministic Bayesian networks (BNs). We developed a procedure to obtain the performance of a MCMC sampling method in the limit of infinite simulation time, extrapolated from relatively short simulations. This approach was used to conduct a study to compare the accuracy, rate of convergence and the time consumption of Prune Sampling with two conventional MCMC sampling methods: Gibbs- and Metropolis sampling. We show that Markov chains created by Prune Sampling always converge to the desired posterior distribution, also for networks where conventional Gibbs sampling fails. Beside this, we demonstrate that pruning outperforms Gibbs sampling, at least for a certain class of BNs. Though, this tempting feature comes at a price. In the first version of Prune Sampling, for large BNs the procedure to choose the next iteration step uniformly is rather time intensive. Our conclusion is that Prune Sampling is a competitive method for all types of small and medium sized BNs, but (for now) standard methods still perform better for all types of large BNs.
Mixed pooling of seasonality in time series pallet forecasting
Multiple seasonal patterns play a key role in time series forecasting, especially for business time series where seasonal effects are often dramatic. Previous approaches including Fourier decomposition, exponential smoothing, and seasonal autoregressive integrated moving average (SARIMA) models do not reflect the distinct characteristics of each period in seasonal patterns, such as the unique behavior of specific days of the week in business data. We propose a multi-dimensional hierarchical model. Intermediate parameters for each seasonal period are first estimated, and a mixture of intermediate parameters is then taken, resulting in a model that successfully reflects the interactions between multiple seasonal patterns. Although this process reduces the data available for each parameter, a robust estimation can be obtained through a hierarchical Bayesian model implemented in Stan. Through this model, it becomes possible to consider both the characteristics of each seasonal period and the interactions among characteristics from multiple seasonal periods. Our new model achieved considerable improvements in prediction accuracy compared to previous models, including Fourier decomposition, which Prophet uses to model seasonality patterns. A comparison was performed on a real-world dataset of pallet transport from a national-scale logistic network.
Distributionally Robust Optimization: A Review
Rahimian, Hamed, Mehrotra, Sanjay
The concepts of risk-aversion, chance-constrained optimization, and robust optimization have developed significantly over the last decade. Statistical learning community has also witnessed a rapid theoretical and applied growth by relying on these concepts. A modeling framework, called distributionally robust optimization (DRO), has recently received significant attention in both the operations research and statistical learning communities. This paper surveys main concepts and contributions to DRO, and its relationships with robust optimization, risk-aversion, chance-constrained optimization, and function regularization.