Directed Networks
Multiscale Dictionary Learning for Estimating Conditional Distributions
Petralia, Francesca, Vogelstein, Joshua, Dunson, David B.
Massive datasets are becoming an ubiquitous byproduct of modern scientific and industrial applications. These data present statistical and computational challenges because many previously developed analysis approaches do not scaleup sufficiently. Challenges arise because of the ultra high-dimensionality and relatively low sample size. Parsimonious models for such big data assume that the density in the ambient space concentrates around a lower-dimensional (possibly nonlinear) subspace. A plethora of methods are emerging to estimate such lower-dimensional subspaces [25, 2]. 1 We are interested in using such lower-dimensional embeddings to obtain estimates of the conditional distribution of some target variable(s). This conditional density estimation setting arises in a number of important application areas, including neuroscience, genetics, and video processing. For example, one might desire automated estimation of a predictive density for a neurologic phenotype of interest, such as intelligence, on the basis of available data for a patient including neuroimaging.
A Junction Tree Framework for Undirected Graphical Model Selection
Vats, Divyanshu, Nowak, Robert
An undirected graphical model is a joint probability distribution defined on an undirected graph G*, where the vertices in the graph index a collection of random variables and the edges encode conditional independence relationships among random variables. The undirected graphical model selection (UGMS) problem is to estimate the graph G* given observations drawn from the undirected graphical model. This paper proposes a framework for decomposing the UGMS problem into multiple subproblems over clusters and subsets of the separators in a junction tree. The junction tree is constructed using a graph that contains a superset of the edges in G*. We highlight three main properties of using junction trees for UGMS. First, different regularization parameters or different UGMS algorithms can be used to learn different parts of the graph. This is possible since the subproblems we identify can be solved independently of each other. Second, under certain conditions, a junction tree based UGMS algorithm can produce consistent results with fewer observations than the usual requirements of existing algorithms. Third, both our theoretical and experimental results show that the junction tree framework does a significantly better job at finding the weakest edges in a graph than existing methods. This property is a consequence of both the first and second properties. Finally, we note that our framework is independent of the choice of the UGMS algorithm and can be used as a wrapper around standard UGMS algorithms for more accurate graph estimation.
Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search
Guez, A., Silver, D., Dayan, P.
Bayesian planning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way. Unfortunately, planning optimally in the face of uncertainty is notoriously taxing, since the search space is enormous. In this paper we introduce a tractable, sample-based method for approximate Bayes-optimal planning which exploits Monte-Carlo tree search. Our approach avoids expensive applications of Bayes rule within the search tree by sampling models from current beliefs, and furthermore performs this sampling in a lazy manner. This enables it to outperform previous Bayesian model-based reinforcement learning algorithms by a significant margin on several well-known benchmark problems. As we show, our approach can even work in problems with an infinite state space that lie qualitatively out of reach of almost all previous work in Bayesian exploration.
One-Class Classification: Taxonomy of Study and Review of Techniques
Khan, Shehroz S., Madden, Michael G.
One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.
Viterbi training in PRISM
Sato, Taisuke, Kubota, Keiichi
VT (Viterbi training), or hard EM, is an efficient way of parameter learning for probabilistic models with hidden variables. Given an observation $y$, it searches for a state of hidden variables $x$ that maximizes $p(x,y \mid \theta)$ by coordinate ascent on parameters $\theta$ and $x$. In this paper we introduce VT to PRISM, a logic-based probabilistic modeling system for generative models. VT improves PRISM in three ways. First VT in PRISM converges faster than EM in PRISM due to the VT's termination condition. Second, parameters learned by VT often show good prediction performance compared to those learned by EM. We conducted two parsing experiments with probabilistic grammars while learning parameters by a variety of inference methods, i.e.\ VT, EM, MAP and VB. The result is that VT achieved the best parsing accuracy among them in both experiments. Also we conducted a similar experiment for classification tasks where a hidden variable is not a prediction target unlike probabilistic grammars. We found that in such a case VT does not necessarily yield superior performance. Third since VT always deals with a single probability of a single explanation, Viterbi explanation, the exclusiveness condition that is imposed on PRISM programs is no more required if we learn parameters by VT. Last but not least we can say that as VT in PRISM is general and applicable to any PRISM program, it largely reduces the need for the user to develop a specific VT algorithm for a specific model. Furthermore since VT in PRISM can be used just by setting a PRISM flag appropriately, it makes VT easily accessible to (probabilistic) logic programmers. To appear in Theory and Practice of Logic Programming (TPLP).
On Approximate Inference for Generalized Gaussian Process Models
Shang, Lifeng, Chan, Antoni B.
A generalized Gaussian process model (GGPM) is a unifying framework that encompasses many existing Gaussian process (GP) models, such as GP regression, classification, and counting. In the GGPM framework, the observation likelihood of the GP model is itself parameterized using the exponential family distribution (EFD). In this paper, we consider efficient algorithms for approximate inference on GGPMs using the general form of the EFD. A particular GP model and its associated inference algorithms can then be formed by changing the parameters of the EFD, thus greatly simplifying its creation for task-specific output domains. We demonstrate the efficacy of this framework by creating several new GP models for regressing to non-negative reals and to real intervals. We also consider a closed-form Taylor approximation for efficient inference on GGPMs, and elaborate on its connections with other model-specific heuristic closed-form approximations. Finally, we present a comprehensive set of experiments to compare approximate inference algorithms on a wide variety of GGPMs.
Unsupervised Sub-tree Alignment for Tree-to-Tree Translation
This article presents a probabilistic sub-tree alignment model and its application to tree-to-tree machine translation. Unlike previous work, we do not resort to surface heuristics or expensive annotated data, but instead derive an unsupervised model to infer the syntactic correspondence between two languages. More importantly, the developed model is syntactically-motivated and does not rely on word alignments. As a by-product, our model outputs a sub-tree alignment matrix encoding a large number of diverse alignments between syntactic structures, from which machine translation systems can efficiently extract translation rules that are often filtered out due to the errors in 1-best alignment. Experimental results show that the proposed approach outperforms three state-of-the-art baseline approaches in both alignment accuracy and grammar quality. When applied to machine translation, our approach yields a +1.0 BLEU improvement and a -0.9 TER reduction on the NIST machine translation evaluation corpora. With tree binarization and fuzzy decoding, it even outperforms a state-of-the-art hierarchical phrase-based system.
Learning Pairwise Graphical Models with Nonlinear Sufficient Statistics
Yuan, Xiao-Tong, Li, Ping, Zhang, Tong
We investigate a generic problem of learning pairwise exponential family graphical models with pairwise sufficient statistics defined by a global mapping function, e.g., Mercer kernels. This subclass of pairwise graphical models allow us to flexibly capture complex interactions among variables beyond pairwise product. We propose two $\ell_1$-norm penalized maximum likelihood estimators to learn the model parameters from i.i.d. samples. The first one is a joint estimator which estimates all the parameters simultaneously. The second one is a node-wise conditional estimator which estimates the parameters individually for each node. For both estimators, we show that under proper conditions the extra flexibility gained in our model comes at almost no cost of statistical and computational efficiency. We demonstrate the advantages of our model over state-of-the-art methods on synthetic and real datasets.
Nonparametric Bayesian models of hierarchical structure in complex networks
Schmidt, Mikkel N., Herlau, Tue, Mørup, Morten
Analyzing and understanding the structure of complex relational data is important in many applications including analysis of the connectivity in the human brain. Such networks can have prominent patterns on different scales, calling for a hierarchically structured model. We propose two non-parametric Bayesian hierarchical network models based on Gibbs fragmentation tree priors, and demonstrate their ability to capture nested patterns in simulated networks. On real networks we demonstrate detection of hierarchical structure and show predictive performance on par with the state of the art. We envision that our methods can be employed in exploratory analysis of large scale complex networks for example to model human brain connectivity.
Streaming Variational Bayes
Broderick, Tamara, Boyd, Nicholas, Wibisono, Andre, Wilson, Ashia C., Jordan, Michael I.
We present SDA-Bayes, a framework for (S)treaming, (D)istributed, (A)synchronous computation of a Bayesian posterior. The framework makes streaming updates to the estimated posterior according to a user-specified approximation batch primitive. We demonstrate the usefulness of our framework, with variational Bayes (VB) as the primitive, by fitting the latent Dirichlet allocation model to two large-scale document collections. We demonstrate the advantages of our algorithm over stochastic variational inference (SVI) by comparing the two after a single pass through a known amount of data---a case where SVI may be applied---and in the streaming setting, where SVI does not apply.