Uncertainty
Sparse Signal Recovery in the Presence of Intra-Vector and Inter-Vector Correlation
Rao, Bhaskar D., Zhang, Zhilin, Jin, Yuzhe
This work discusses the problem of sparse signal recovery when there is correlation among the values of non-zero entries. We examine intra-vector correlation in the context of the block sparse model and inter-vector correlation in the context of the multiple measurement vector model, as well as their combination. Algorithms based on the sparse Bayesian learning are presented and the benefits of incorporating correlation at the algorithm level are discussed. The impact of correlation on the limits of support recovery is also discussed highlighting the different impact intra-vector and inter-vector correlations have on such limits.
Efficient Methods for Unsupervised Learning of Probabilistic Models
Interpreting neural spike trains, compressing video, identifying features in DNA microarrays, and recognizing particles in high energy physics all rely upon the ability to find and model complex structure in a high dimensional space. Despite their great promise, high dimensional probabilistic models are frequently computationally intractable to work with in practice. In this thesis I develop solutions to overcome this intractability, primarily in the context of energy based models. A common cause of intractability is that model distributions cannot be analytically normalized. Probabilities can only be computed up to a constant, making training exceedingly difficult. To solve this problem I propose'minimum probability flow learning', a variational technique for parameter estimation in such models.
Two New Algorithms for Solving Covariance Graphical Lasso Based on Coordinate Descent and ECM
Covariance graphical lasso applies a lasso penalty on the elements of the covariance matrix. This method is useful because it not only produces sparse estimation of covariance matrix but also discovers marginal independence structures by generating zeros in the covariance matrix. We propose and explore two new algorithms for solving the covariance graphical lasso problem. Our new algorithms are based on coordinate descent and ECM. We show that these two algorithms are more attractive than the only existing competing algorithm of Bien and Tibshirani (2011) in terms of simplicity, speed and stability. We also discuss convergence properties of our algorithms.
Adaptive experimental design for one-qubit state estimation with finite data based on a statistical update criterion
Sugiyama, Takanori, Turner, Peter S., Murao, Mio
For successful experimental implementation of any quantum protocol, the quantum states and operations involved must be confirmed to be sufficiently closed to their theoretical targets. One way to obtain such a confirmation is to perform another experiment and from the obtained data make an estimate of the quantum operator involved. Statistically, this is a constrained multiparameter estimation problem - the quantum estimation problem - where we assume we are given a finite number of identical copies of a quantum state or operation, we perform measurements whose mathematical description is assumed to be known, and from the outcome statistics we make our estimate. Due to the probabilistic behavior of the measurement outcomes and the finiteness of the number of measurement trials, there always exist statistical errors in any quantum estimate. The size of the error depends on the choice of measurements and the estimation procedure. In statistics, the former is called an experimental design, while the latter is called an estimator. It is, therefore, a key aim of both classical and quantum estimation theory to find a combination of experimental design and estimator which gives us more precise estimation results using fewer measurement trials. A standard combination in quantum information experiments is that of quantum tomography and maximum likelihood estimator. Although the term "quantum tomography" can be used in several different contexts, we use it to mean an experimental design in which an independently and identically prepared set of measurements are used throughout the entire experiment [1].
An improved approach to attribute reduction with covering rough sets
Wang, Changzhong, Sun, Baiqing, Hu, Qinhua
Attribute reduction is viewed as an important preprocessing step for pattern recognition and data mining. Most of researches are focused on attribute reduction by using rough sets. Recently, Tsang et al. discussed attribute reduction with covering rough sets in the paper [E. C.C. Tsang, D. Chen, Daniel S. Yeung, Approximations and reducts with covering generalized rough sets, Computers and Mathematics with Applications 56 (2008) 279-289], where an approach based on discernibility matrix was presented to compute all attribute reducts. In this paper, we provide an improved approach by constructing simpler discernibility matrix with covering rough sets, and then proceed to improve some characterizations of attribute reduction provided by Tsang et al. It is proved that the improved discernible matrix is equivalent to the old one, but the computational complexity of discernible matrix is greatly reduced.
Counting Belief Propagation
Kersting, Kristian, Ahmadi, Babak, Natarajan, Sriraam
A major benefit of graphical models is that most knowledge is captured in the model structure. Many models, however, produce inference problems with a lot of symmetries not reflected in the graphical structure and hence not exploitable by efficient inference techniques such as belief propagation (BP). In this paper, we present a new and simple BP algorithm, called counting BP, that exploits such additional symmetries. Starting from a given factor graph, counting BP first constructs a compressed factor graph of clusternodes and clusterfactors, corresponding to sets of nodes and factors that are indistinguishable given the evidence. Then it runs a modified BP algorithm on the compressed graph that is equivalent to running BP on the original factor graph. Our experiments show that counting BP is applicable to a variety of important AI tasks such as (dynamic) relational models and boolean model counting, and that significant efficiency gains are obtainable, often by orders of magnitude.
Learning Continuous-Time Social Network Dynamics
Fan, Yu, Shelton, Christian R.
We demonstrate that a number of sociology models for social network dynamics can be viewed as continuous time Bayesian networks (CTBNs). A sampling-based approximate inference method for CTBNs can be used as the basis of an expectation-maximization procedure that achieves better accuracy in estimating the parameters of the model than the standard method of moments algorithmfromthe sociology literature. We extend the existing social network models to allow for indirect and asynchronous observations of the links. A Markov chain Monte Carlo sampling algorithm for this new model permits estimation and inference. We provide results on both a synthetic network (for verification) and real social network data.
On Smoothing and Inference for Topic Models
Asuncion, Arthur, Welling, Max, Smyth, Padhraic, Teh, Yee Whye
Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for modeling high-dimensional sparse count data. Various learning algorithms have been developed in recent years, including collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and this variety motivates the need for careful empirical comparisons. In this paper, we highlight the close connections between these approaches. We find that the main differences are attributable to the amount of smoothing applied to the counts. When the hyperparameters are optimized, the differences in performance among the algorithms diminish significantly. The ability of these algorithms to achieve solutions of comparable accuracy gives us the freedom to select computationally efficient approaches. Using the insights gained from this comparative study, we show how accurate topic models can be learned in several seconds on text corpora with thousands of documents.
Bayesian Discovery of Linear Acyclic Causal Models
Hoyer, Patrik O., Hyttinen, Antti
Methods for automated discovery of causal relationships from non-interventional data have received much attention recently. A widely used and well understood model family is given by linear acyclic causal models (recursive structural equation models). For Gaussian data both constraint-based methods (Spirtes et al., 1993; Pearl, 2000) (which output a single equivalence class) and Bayesian score-based methods (Geiger and Heckerman, 1994) (which assign relative scores to the equivalence classes) are available. On the contrary, all current methods able to utilize non-Gaussianity in the data (Shimizu et al., 2006; Hoyer et al., 2008) always return only a single graph or a single equivalence class, and so are fundamentally unable to express the degree of certainty attached to that output. In this paper we develop a Bayesian score-based approach able to take advantage of non-Gaussianity when estimating linear acyclic causal models, and we empirically demonstrate that, at least on very modest size networks, its accuracy is as good as or better than existing methods. We provide a complete code package (in R) which implements all algorithms and performs all of the analysis provided in the paper, and hope that this will further the application of these methods to solving causal inference problems.
Interpretation and Generalization of Score Matching
Score matching is a recently developed parameter learning method that is particularly effective to complicated high dimensional density models with intractable partition functions. In this paper, we study two issues that have not been completely resolved for score matching. First, we provide a formal link between maximum likelihood and score matching. Our analysis shows that score matching finds model parameters that are more robust with noisy training data. Second, we develop a generalization of score matching. Based on this generalization, we further demonstrate an extension of score matching to models of discrete data.