Goto

Collaborating Authors

 Learning Graphical Models


A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted Topics

AAAI Conferences

This paper presents the Topic-Aspect Model (TAM), a Bayesian mixture model which jointly discovers topics and aspects. We broadly define an aspect of a document as a characteristic that spans the document, such as an underlying theme or perspective. Unlike previous models which cluster words by topic or aspect, our model can generate token assignments in both of these dimensions, rather than assuming words come from only one of two orthogonal models. We present two applications of the model. First, we model a corpus of computational linguistics abstracts, and find that the scientific topics identified in the data tend to include both a computational aspect and a linguistic aspect. For example, the computational aspect of GRAMMAR emphasizes parsing, whereas the linguistic aspect focuses on formal languages. Secondly, we show that the model can capture different viewpoints on a variety of topics in a corpus of editorials about the Israeli-Palestinian conflict. We show both qualitative and quantitative improvements in TAM over two other state-of-the-art topic models.


Learning Causal Models of Relational Domains

AAAI Conferences

Methods for discovering causal knowledge from observational data have been a persistent topic of AI research for several decades. Essentially all of this work focuses on knowledge representations for propositional domains. In this paper, we present several key algorithmic and theoretical innovations that extend causal discovery to relational domains. We provide strong evidence that effective learning of causal models is enhanced by relational representations. We present an algorithm, relational PC, that learns causal dependencies in a state-of-the-art relational representation, and we identify the key representational and algorithmic innovations that make the algorithm possible. Finally, we prove the algorithm's theoretical correctness and demonstrate its effectiveness on synthetic and real data sets.


Gaussian Mixture Model with Local Consistency

AAAI Conferences

Gaussian Mixture Model (GMM) is one of the most popular data clustering methods which can be viewed as a linear combination of different Gaussian components. In GMM, each cluster obeys Gaussian distribution and the task of clustering is to group observations into different components through estimating each cluster's own parameters. The Expectation-Maximization algorithm is always involved in such estimation problem. However, many previous studies have shown naturally occurring data may reside on or close to an underlying submanifold. In this paper, we consider the case where the probability distribution is supported on a submanifold of the ambient space. We take into account the smoothness of the conditional probability distribution along the geodesics of data manifold. That is, if two observations are close in intrinsic geometry, their distributions over different Gaussian components are similar. Simply speaking, we introduce a novel method based on manifold structure for data clustering, called Locally Consistent Gaussian Mixture Model (LCGMM). Specifically, we construct a nearest neighbor graph and adopt Kullback-Leibler Divergence as the distance measurement to regularize the objective function of GMM. Experiments on several data sets demonstrate the effectiveness of such regularization.


Structure Learning for Markov Logic Networks with Many Descriptive Attributes

AAAI Conferences

Many machine learning applications that involve relational databases incorporate first-order logic and probability. Markov Logic Networks (MLNs) are a prominent statistical relational model that consist of weighted first order clauses. Many of the current state-of-the-art algorithms for learning MLNs have focused on relatively small datasets with few descriptive attributes, where predicates are mostly binary and the main task is usually prediction of links between entities. This paper addresses what is in a sense a complementary problem: learning the structure of an MLN that models the distribution of discrete descriptive attributes on medium to large datasets, given the links between entities in a relational database. Descriptive attributes are usually nonbinary and can be very informative, but they increase the search space of possible candidate clauses. We present an efficient new algorithm for learning a directed relational model (parametrized Bayes net), which produces an MLN structure via a standard moralization procedure for converting directed models to undirected models. Learning MLN structure in this way is 200-1000 times faster and scores substantially higher in predictive accuracy than benchmark algorithms on three relational databases.


Learning Discriminative Piecewise Linear Models with Boundary Points

AAAI Conferences

We introduce a new discriminative piecewise linear model for classification. A two-step method is developed to construct the model. In the first step, we sample some boundary points that lie between positive and negative data, as well as corresponding directions from negative data to positive data. The sampling result gives a discriminative nonparametric decision surface, which preserves enough information to correctly classify all training data. To simplify this surface, in the second step we propose a nonparametric approach for linear surface segmentation using Dirichlet process mixtures. The final result is a piecewise linear model, in which the number of linear surface pieces is automatically determined by the Bayesian inference according to data. Experiments on both synthetic and real data verify the effectiveness of the proposed model.


Properties of Bayesian Dirichlet Scores to Learn Bayesian Network Structures

AAAI Conferences

As we see later, the mathematical derivations are more elaborate A Bayesian network is a probabilistic graphical model that than those recently introduced for BIC and AIC criteria relies on a structured dependency among random variables (de Campos, Zeng, and Ji 2009), and the reduction in the to represent a joint probability distribution in a compact and search space and cache size are less effective when priors efficient manner. It is composed by a directed acyclic graph are strong, but still relevant. This is expected, as the BIC (DAG) where nodes are associated to random variables and score is known to penalize complex graphs more than BD conditional probability distributions are defined for variables scores do. We show that the search space can be reduced given their parents in the graph. Learning the graph (or without losing the global optimality guarantee and that the structure) of these networks from data is one of the most memory requirements are small in many practical cases.


Learning Spatial-Temporal Varying Graphs with Applications to Climate Data Analysis

AAAI Conferences

An important challenge in understanding climate change is to uncover the dependency relationships between various climate observations and forcing factors. Graphical lasso, a recently proposed L 1 penalty based structure learning algorithm, has been proven successful for learning underlying dependency structures for the data drawn from a multivariate Gaussian distribution. However, climatological data often turn out to be non-Gaussian, e.g. cloud cover, precipitation, etc. In this paper, we examine nonparametric learning methods to address this challenge. In particular, we develop a methodology to learn dynamic graph structures from spatial-temporal data so that the graph structures at adjacent time or locations are similar. Experimental results demonstrate that our method not only recovers the underlying graph well but also captures the smooth variation properties on both synthetic data and climate data. An important challenge in understanding climate change is to uncover the dependency relationships between various climate observations and forcing factors. Graphical lasso, a recently proposed An important challenge in understanding climate change is to uncover the dependency relationships between various climate observations and forcing factors. Graphical lasso, a recently proposed L 1 penalty based structure learning algorithm, has been proven successful for learning underlying dependency structures for the data drawn from a multivariate Gaussian distribution. However, climatological data often turn out to be non-Gaussian, e.g. cloud cover, precipitation, etc. In this paper, we examine nonparametric learning methods to address this challenge. In particular, we develop a methodology to learn dynamic graph structures from spatial-temporal data so that the graph structures at adjacent time or locations are similar. Experimental results demonstrate that our method not only recovers the underlying graph well but also captures the smooth variation properties on both synthetic data and climate data.


Latent Variable Model for Learning in Pairwise Markov Networks

AAAI Conferences

Pairwise Markov Networks (PMN) are an important class of Markov networks which, due to their simplicity, are widely used in many applications such as image analysis, bioinformatics, sensor networks, etc. However, learning of Markov networks from data is a challenging task; there are many possible structures one must consider and each of these structures comes with its own parameters making it easy to overfit the model with limited data. To deal with the problem, recent learning methods build upon the L1 regularization to express the bias towards sparse network structures. In this paper, we propose a new and more flexible framework that let us bias the structure, that can, for example, encode the preference to networks with certain local substructures which as a whole exhibit some special global structure. We experiment with and show the benefit of our framework on two types of problems: learning of modular networks and learning of traffic networks models.


Decomposed Utility Functions and Graphical Models for Reasoning about Preferences

AAAI Conferences

Recently, Brafman and Engel (2009) proposed new concepts of marginal and conditional utility that obey additive analogues of the chain rule and Bayes rule, which they employed to obtain a directed graphical model of utility functions that resembles Bayes nets. In this paper we carry this analogy a step farther by showing that the notion of utility independence, built on conditional utility, satisfies identical properties to those of probabilistic independence. This allows us to formalize the construction of graphical models for utility functions, directed and undirected, and place them on the firm foundations of Pearl and Paz's axioms of semi-graphoids. With this strong equivalence in place, we show how algorithms used for probabilistic reasoning such as Belief Propagation (Pearl 1988) can be replicated to reasoning about utilities with the same formal guarantees, and open the way to the adaptation of additional algorithms.


Dirichlet Process Mixtures of Generalized Linear Models

arXiv.org Machine Learning

We propose Dirichlet Process mixtures of Generalized Linear Models (DP-GLM), a new method of nonparametric regression that accommodates continuous and categorical inputs, and responses that can be modeled by a generalized linear model. We prove conditions for the asymptotic unbiasedness of the DP-GLM regression mean function estimate. We also give examples for when those conditions hold, including models for compactly supported continuous distributions and a model with continuous covariates and categorical response. We empirically analyze the properties of the DP-GLM and why it provides better results than existing Dirichlet process mixture regression models. We evaluate DP-GLM on several data sets, comparing it to modern methods of nonparametric regression like CART, Bayesian trees and Gaussian processes. Compared to existing techniques, the DP-GLM provides a single model (and corresponding inference algorithms) that performs well in many regression settings.