Genre
On-line Bayesian parameter estimation in general non-linear state-space models: A tutorial and new results
Tulsyan, Aditya, Huang, Biao, Gopaluni, R. Bhushan, Forbes, J. Fraser
On-line estimation plays an important role in process control and monitoring. Obtaining a theoretical solution to the simultaneous state-parameter estimation problem for non-linear stochastic systems involves solving complex multi-dimensional integrals that are not amenable to analytical solution. While basic sequential Monte-Carlo (SMC) or particle filtering (PF) algorithms for simultaneous estimation exist, it is well recognized that there is a need for making these on-line algorithms non-degenerate, fast and applicable to processes with missing measurements. To overcome the deficiencies in traditional algorithms, this work proposes a Bayesian approach to on-line state and parameter estimation. Its extension to handle missing data in real-time is also provided. The simultaneous estimation is performed by filtering an extended vector of states and parameters using an adaptive sequential-importance-resampling (SIR) filter with a kernel density estimation method. The approach uses an on-line optimization algorithm based on Kullback-Leibler (KL) divergence to allow adaptation of the SIR filter for combined state-parameter estimation. An optimal tuning rule to control the width of the kernel and the variance of the artificial noise added to the parameters is also proposed. The approach is illustrated through numerical examples.
Thompson Sampling for 1-Dimensional Exponential Family Bandits
Korda, Nathaniel, Kaufmann, Emilie, Munos, Remi
Thompson Sampling has been demonstrated in many complex bandit models, however the theoretical guarantees available for the parametric multi-armed bandit are still limited to the Bernoulli case. Here we extend them by proving asymptotic optimality of the algorithm using the Jeffreys prior for 1-dimensional exponential family bandits. Our proof builds on previous work, but also makes extensive use of closed forms for Kullback-Leibler divergence and Fisher information (and thus Jeffreys prior) available in an exponential family. This allow us to give a finite time exponential concentration inequality for posterior distributions on exponential families that may be of interest in its own right. Moreover our analysis covers some distributions for which no optimistic algorithm has yet been proposed, including heavy-tailed exponential families.
Improving MUC extraction thanks to local search
Grรฉgoire, รric, Lagniez, Jean-Marie, Mazure, Bertrand
ExtractingMUCs(MinimalUnsatisfiableCores)fromanunsatisfiable constraint network is a useful process when causes of unsatisfiability must be understood so that the network can be re-engineered and relaxed to become sat- isfiable. Despite bad worst-case computational complexity results, various MUC- finding approaches that appear tractable for many real-life instances have been proposed. Many of them are based on the successive identification of so-called transition constraints. In this respect, we show how local search can be used to possibly extract additional transition constraints at each main iteration step. The approach is shown to outperform a technique based on a form of model rotation imported from the SAT-related technology and that also exhibits additional transi- tion constraints. Our extensive computational experimentations show that this en- hancement also boosts the performance of state-of-the-art DC(WCORE)-like MUC extractors.
Accuracy of MAP segmentation with hidden Potts and Markov mesh prior models via Path Constrained Viterbi Training, Iterated Conditional Modes and Graph Cut based algorithms
Flesia, Ana Georgina, Baumgartner, Josef, Gimenez, Javier, Martinez, Jorge
In this paper, we study statistical classification accuracy of two different Markov field environments for pixelwise image segmentation, considering the labels of the image as hidden states and solving the estimation of such labels as a solution of the MAP equation. The emission distribution is assumed the same in all models, and the difference lays in the Markovian prior hypothesis made over the labeling random field. The a priori labeling knowledge will be modeled with a) a second order anisotropic Markov Mesh and b) a classical isotropic Potts model. Under such models, we will consider three different segmentation procedures, 2D Path Constrained Viterbi training for the Hidden Markov Mesh, a Graph Cut based segmentation for the first order isotropic Potts model, and ICM (Iterated Conditional Modes) for the second order isotropic Potts model. We provide a unified view of all three methods, and investigate goodness of fit for classification, studying the influence of parameter estimation, computational gain, and extent of automation in the statistical measures Overall Accuracy, Relative Improvement and Kappa coefficient, allowing robust and accurate statistical analysis on synthetic and real-life experimental data coming from the field of Dental Diagnostic Radiography. All algorithms, using the learned parameters, generate good segmentations with little interaction when the images have a clear multimodal histogram. Suboptimal learning proves to be frail in the case of non-distinctive modes, which limits the complexity of usable models, and hence the achievable error rate as well. All Matlab code written is provided in a toolbox available for download from our website, following the Reproducible Research Paradigm.
Minimum Distance Estimation for Robust High-Dimensional Regression
Lozano, Aurรฉlie C., Meinshausen, Nicolai
We propose a minimum distance estimation method for robust regression in sparse high-dimensional settings. The traditional likelihood-based estimators lack resilience against outliers, a critical issue when dealing with high-dimensional noisy data. Our method, Minimum Distance Lasso (MD-Lasso), combines minimum distance functionals, customarily used in nonparametric estimation for their robustness, with l1-regularization for high-dimensional regression. The geometry of MD-Lasso is key to its consistency and robustness. The estimator is governed by a scaling parameter that caps the influence of outliers: the loss per observation is locally convex and close to quadratic for small squared residuals, and flattens for squared residuals larger than the scaling parameter. As the parameter approaches infinity, the estimator becomes equivalent to least-squares Lasso. MD-Lasso enjoys fast convergence rates under mild conditions on the model error distribution, which hold for any of the solutions in a convexity region around the true parameter and in certain cases for every solution. Remarkably, a first-order optimization method is able to produce iterates very close to the consistent solutions, with geometric convergence and regardless of the initialization. A connection is established with re-weighted least-squares that intuitively explains MD-Lasso robustness. The merits of our method are demonstrated through simulation and eQTL data analysis.
Application of three graph Laplacian based semi-supervised learning methods to protein function prediction problem
Protein function prediction is the important problem in modern biology. In this paper, the un-normalized, symmetric normalized, and random walk graph Laplacian based semi-supervised learning methods will be applied to the integrated network combined from multiple networks to predict the functions of all yeast proteins in these multiple networks. These multiple networks are network created from Pfam domain structure, co-participation in a protein complex, protein-protein interaction network, genetic interaction network, and network created from cell cycle gene expression measurements. Multiple networks are combined with fixed weights instead of using convex optimization to determine the combination weights due to high time complexity of convex optimization method. This simple combination method will not affect the accuracy performance measures of the three semi-supervised learning methods. Experiment results show that the un-normalized and symmetric normalized graph Laplacian based methods perform slightly better than random walk graph Laplacian based method for integrated network. Moreover, the accuracy performance measures of these three semi-supervised learning methods for integrated network are much better than the best accuracy performance measures of these three methods for the individual network.
A Reliable Effective Terascale Linear Learning System
Agarwal, Alekh, Chapelle, Olivier, Dudik, Miroslav, Langford, John
We present a system and a set of techniques for learning linear predictors with convex losses on terascale datasets, with trillions of features, {The number of features here refers to the number of non-zero entries in the data matrix.} billions of training examples and millions of parameters in an hour using a cluster of 1000 machines. Individually none of the component techniques are new, but the careful synthesis required to obtain an efficient implementation is. The result is, up to our knowledge, the most scalable and efficient linear learning system reported in the literature (as of 2011 when our experiments were conducted). We describe and thoroughly evaluate the components of the system, showing the importance of the various design choices.
Error Rate Bounds in Crowdsourcing Models
Li, Hongwei, Yu, Bin, Zhou, Dengyong
Crowdsourcing is an effective tool for human-powered computation on many tasks challenging for computers. In this paper, we provide finite-sample exponential bounds on the error rate (in probability and in expectation) of hyperplane binary labeling rules under the Dawid-Skene crowdsourcing model. The bounds can be applied to analyze many common prediction methods, including the majority voting and weighted majority voting. These bound results could be useful for controlling the error rate and designing better algorithms. We show that the oracle Maximum A Posterior (MAP) rule approximately optimizes our upper bound on the mean error rate for any hyperplane binary labeling rule, and propose a simple data-driven weighted majority voting (WMV) rule (called one-step WMV) that attempts to approximate the oracle MAP and has a provable theoretical guarantee on the error rate. Moreover, we use simulated and real data to demonstrate that the data-driven EM-MAP rule is a good approximation to the oracle MAP rule, and to demonstrate that the mean error rate of the data-driven EM-MAP rule is also bounded by the mean error rate bound of the oracle MAP rule with estimated parameters plugging into the bound.
Exploring Disease Interactions Using Markov Networks
Haaren, Jan Van (Katholieke Universiteit Leuven) | Davis, Jesse (Katholieke Universiteit Leuven) | Lappenschaar, Martijn (Radboud Universiteit Nijmegen) | Hommersom, Arjen (Radboud Universiteit Nijmegen)
Network medicine is an emerging paradigm for studying the co-occurrence between diseases. While diseases are often interlinked through complex patterns, most of the existing work in this area has focused on studying pairwise relationships between diseases. In this paper, we use a state-of-the-art Markov network learning method to learn interactions between musculoskeletal disorders and cardiovascular diseases and compare this to pairwise approaches. Our experimental results confirm that the sophisticated structure learner produces more accurate models, which can help reveal interesting patterns in the co-occurrence of diseases.
Exploring Disease Interactions Using Markov Networks
Haaren, Jan Van (Katholieke Universiteit Leuven) | Davis, Jesse (Katholieke Universiteit Leuven) | Lappenschaar, Martijn (Radboud Universiteit Nijmegen) | Hommersom, Arjen (Radboud Universiteit Nijmegen)
Network medicine is an emerging paradigm for studying the co-occurrence between diseases. While diseases are often interlinked through complex patterns, most of the existing work in this area has focused on studying pairwise relationships between diseases. In this paper, we use a state-of-the-art Markov network learning method to learn interactions between musculoskeletal disorders and cardiovascular diseases and compare this to pairwise approaches. Our experimental results confirm that the sophisticated structure learner produces more accurate models, which can help reveal interesting patterns in the co-occurrence of diseases.