Learning Graphical Models
A high-reproducibility and high-accuracy method for automated topic classification
Lancichinetti, Andrea, Sirer, M. Irmak, Wang, Jane X., Acuna, Daniel, Körding, Konrad, Amaral, Luís A. Nunes
Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requires algorithms that extract and record metadata on unstructured text documents. Assigning topics to documents will enable intelligent search, statistical characterization, and meaningful classification. Latent Dirichlet allocation (LDA) is the state-of-the-art in topic classification. Here, we perform a systematic theoretical and numerical analysis that demonstrates that current optimization techniques for LDA often yield results which are not accurate in inferring the most suitable model parameters. Adapting approaches for community detection in networks, we propose a new algorithm which displays high-reproducibility and high-accuracy, and also has high computational efficiency. We apply it to a large set of documents in the English Wikipedia and reveal its hierarchical structure. Our algorithm promises to make "big data" text analysis systems more reliable.
DinTucker: Scaling up Gaussian process models on multidimensional arrays with billions of elements
Zhe, Shandian, Qi, Yuan, Park, Youngja, Molloy, Ian, Chari, Suresh
Infinite Tucker Decomposition (InfTucker) and random function prior models, as nonparametric Bayesian models on infinite exchangeable arrays, are more powerful models than widely-used multilinear factorization methods including Tucker and PARAFAC decomposition, (partly) due to their capability of modeling nonlinear relationships between array elements. Despite their great predictive performance and sound theoretical foundations, they cannot handle massive data due to a prohibitively high training time. To overcome this limitation, we present Distributed Infinite Tucker (DINTUCKER), a large-scale nonlinear tensor decomposition algorithm on MAPREDUCE. While maintaining the predictive accuracy of InfTucker, it is scalable on massive data. DINTUCKER is based on a new hierarchical Bayesian model that enables local training of InfTucker on subarrays and information integration from all local training results. We use distributed stochastic gradient descent, coupled with variational inference, to train this model. We apply DINTUCKER to multidimensional arrays with billions of elements from applications in the "Read the Web" project (Carlson et al., 2010) and in information security and compare it with the state-of-the-art large-scale tensor decomposition method, GigaTensor. On both datasets, DINTUCKER achieves significantly higher prediction accuracy with less computational time.
Marginal and simultaneous predictive classification using stratified graphical models
Nyman, Henrik, Xiong, Jie, Pensar, Johan, Corander, Jukka
Supervised classification is one of the most common tasks considered in machine learning and statistics (Bishop, 2007; Duda et al., 2000; Hastie et al., 2009; Ripley, 1996), with a wide variety of applications over practically all fields of science and engineering. Today, there exists a myriad of different classification methods, out of which those based on probabilistic models are widely accepted as the most sensible way to solve classification problems. Probabilistic methods are often themselves classified as either generative or discriminative, depending on whether one directly models the class posterior distribution (discriminative classifiers) or first the joint distribution of observed features (variables) conditional on class training data and then the posterior distribution of labels is obtained through Bayes' rule. There has been a debate around which of these approaches should be preferred in a particular application, see Ripley (1996), Hastie et al. (2009), Bishop (2007), and Pernkopf and Bilmes (2005), however, both classes of methods continue to be supported and further developed. One of the popular methods of probabilistic classification is based on encoding feature dependencies with Bayesian networks (Friedman et al., 1997). Such models can often represent data structures more faithfully than the naive Bayes classifier, which has been shown to yield dramatic improvements in classification accuracy in some cases. Numerous variants and extensions of the original framework introduced by Friedman et al. (1997) have been considered over the years, e.g.
Bayesian nonparametric comorbidity analysis of psychiatric disorders
Ruiz, Francisco J. R., Valera, Isabel, Blanco, Carlos, Perez-Cruz, Fernando
The analysis of comorbidity is an open and complex research field in the branch of psychiatry, where clinical experience and several studies suggest that the relation among the psychiatric disorders may have etiological and treatment implications. In this paper, we are interested in applying latent feature modeling to find the latent structure behind the psychiatric disorders that can help to examine and explain the relationships among them. To this end, we use the large amount of information collected in the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) database and propose to model these data using a nonparametric latent model based on the Indian Buffet Process (IBP). Due to the discrete nature of the data, we first need to adapt the observation model for discrete random variables. We propose a generative model in which the observations are drawn from a multinomial-logit distribution given the IBP matrix. The implementation of an efficient Gibbs sampler is accomplished using the Laplace approximation, which allows integrating out the weighting factors of the multinomial-logit likelihood model. We also provide a variational inference algorithm for this model, which provides a complementary (and less expensive in terms of computational complexity) alternative to the Gibbs sampler allowing us to deal with a larger number of data. Finally, we use the model to analyze comorbidity among the psychiatric disorders diagnosed by experts from the NESARC database.
Tempering by Subsampling
van de Meent, Jan-Willem, Paige, Brooks, Wood, Frank
In this paper we demonstrate that tempering Markov chain Monte Carlo samplers for Bayesian models by recursively subsampling observations without replacement can improve the performance of baseline samplers in terms of effective sample size per computation. We present two tempering by subsampling algorithms, subsampled parallel tempering and subsampled tempered transitions. We provide an asymptotic analysis of the computational cost of tempering by subsampling, verify that tempering by subsampling costs less than traditional tempering, and demonstrate both algorithms on Bayesian approaches to learning the mean of a high dimensional multivariate Normal and estimating Gaussian process hyperparameters.
Bayesian Properties of Normalized Maximum Likelihood and its Fast Computation
Barron, Andrew, Roos, Teemu, Watanabe, Kazuho
The normalized maximized likelihood (NML) provides the minimax regret solution in universal data compression, gambling, and prediction, and it plays an essential role in the minimum description length (MDL) method of statistical modeling and estimation. Here we show that the normalized maximum likelihood has a Bayes-like representation as a mixture of the component models, even in finite samples, though the weights of linear combination may be both positive and negative. This representation addresses in part the relationship between MDL and Bayes modeling. This representation has the advantage of speeding the calculation of marginals and conditionals required for coding and prediction applications.
Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts
Nguyen, Vu, Phung, Dinh, Nguyen, XuanLong, Venkatesh, Svetha, Bui, Hung Hai
We present a Bayesian nonparametric framework for multilevel clustering which utilizes group-level context information to simultaneously discover low-dimensional structures of the group contents and partitions groups into clusters. Using the Dirichlet process as the building block, our model constructs a product base-measure with a nested structure to accommodate content and context observations at multiple levels. The proposed model possesses properties that link the nested Dirichlet processes (nDP) and the Dirichlet process mixture models (DPM) in an interesting way: integrating out all contents results in the DPM over contexts, whereas integrating out group-specific contexts results in the nDP mixture over content variables. We provide a Polya-urn view of the model and an efficient collapsed Gibbs inference procedure. Extensive experiments on real-world datasets demonstrate the advantage of utilizing context information via our model in both text and image domains.
Universal Approximation Depth and Errors of Narrow Belief Networks with Discrete Units
A deep belief network (DBN) (Hinton et al., 2006) is a layered stochastic network with undirected bipartite interactions between the units in the top two layers, and directed bipartite interactions between the units in all other subsequent pairs of layers, directed towards the bottom layer. The top two layers form a restricted Boltzmann machine (RBM) (Smolensky, 1986). The entire network defines a model of probability distributions on the states of the units in the bottom layer, the visible layer. When the number of units in every layer has the same order of magnitude, the network is called narrow . The depth refers to the number of layers. Deep network architectures are believed to play a key role in information processing of intelligent agents, see (Bengio, 2009) for an overview on this exciting topic. DBNs were the first deep architectures to be envisaged together with an efficient unsupervised training algorithm (Hinton et al., 2006).
Reversible MCMC on Markov equivalence classes of sparse directed acyclic graphs
He, Yangbo, Jia, Jinzhu, Yu, Bin
Graphical models are popular statistical tools which are used to represent dependent or causal complex systems. Statistically equivalent causal or directed graphical models are said to belong to a Markov equivalent class. It is of great interest to describe and understand the space of such classes. However, with currently known algorithms, sampling over such classes is only feasible for graphs with fewer than approximately 20 vertices. In this paper, we design reversible irreducible Markov chains on the space of Markov equivalent classes by proposing a perfect set of operators that determine the transitions of the Markov chain. The stationary distribution of a proposed Markov chain has a closed form and can be computed easily. Specifically, we construct a concrete perfect set of operators on sparse Markov equivalence classes by introducing appropriate conditions on each possible operator. Algorithms and their accelerated versions are provided to efficiently generate Markov chains and to explore properties of Markov equivalence classes of sparse directed acyclic graphs (DAGs) with thousands of vertices. We find experimentally that in most Markov equivalence classes of sparse DAGs, (1) most edges are directed, (2) most undirected subgraphs are small and (3) the number of these undirected subgraphs grows approximately linearly with the number of vertices. The article contains supplement arXiv:1303.0632, http://dx.doi.org/10.1214/13-AOS1125SUPP
Painting Analysis Using Wavelets and Probabilistic Topic Models
Wu, Tong, Polatkan, Gungor, Steel, David, Brown, William, Daubechies, Ingrid, Calderbank, Robert
PAINTING ANALYSIS USING WAVELETS AND PROBABILISTIC TOPIC MODELS Tong Wu, Gungor Polatkan, David Steel, William Brown, Ingrid Daubechies and Robert Calderbank ABSTRACT In this paper, computer-based techniques for stylistic analysis of paintings are applied to the five panels of the 14th century Peruzzi Altarpiece by Giotto di Bondone. Features are extracted by combining a dual-tree complex wavelet transform with a hidden Markov tree (HMT) model. Hierarchical clustering is used to identify stylistic keywords in image patches, and keyword frequencies are calculated for sub-images that each contains many patches. A generative hierarchical Bayesian model learns stylistic patterns of keywords; these patterns are then used to characterize the styles of the sub-images; this in turn, permits to discriminate between paintings. Results suggest that such unsupervised probabilistic topic models can be useful to distill characteristic elements of style. Index Terms -- Painting Analysis, Wavelet Transforms, Hidden Markov Trees, Topic Models, Machine Learning 1. INTRODUCTION In recent years wavelet methods have contributed to art history through their application to forgery detection [1], linking of underdrawing and overpainting [2], and uncovering elements of style [3, 4].