Goto

Collaborating Authors

 Directed Networks


Continuation Methods for Mixing Heterogenous Sources

arXiv.org Machine Learning

A number of modern learning tasks involve estimation from heterogeneous information sources. This includes classification with labeled and unlabeled data as well as other problems with analogous structure such as competitive (game theoretic) problems. The associated estimation problems can be typically reduced to solving a set of fixed point equations (consistency conditions). We introduce a general method for combining a preferred information source with another in this setting by evolving continuous paths of fixed points at intermediate allocations. We explicitly identify critical points along the unique paths to either increase the stability of estimation or to ensure a significant departure from the initial source. The homotopy continuation approach is guaranteed to terminate at the second source, and involves no combinatorial effort. We illustrate the power of these ideas both in classification tasks with labeled and unlabeled data, as well as in the context of a competitive (min-max) formulation of DNA sequence motif discovery.


Coordinates: Probabilistic Forecasting of Presence and Availability

arXiv.org Artificial Intelligence

We present methods employed in COORDINATE, a prototype service that supports collaboration and communication by learning predictive models that provide forecasts of users' presence and availability. We describe how data is collected about user activity and proximity from multiple devices, in addition to analysis of the content of users' calendars, the time of day, and day of week. We review applications of presence forecasting embedded in the PRIORITIES application and then present details of the COORDINATE service that was informed by the earlier efforts.


Accelerating Inference: towards a full Language, Compiler and Hardware stack

arXiv.org Machine Learning

We introduce Dimple, a fully open-source API for probabilistic modeling. Dimple allows the user to specify probabilistic models in the form of graphical models, Bayesian networks, or factor graphs, and performs inference (by automatically deriving an inference engine from a variety of algorithms) on the model. Dimple also serves as a compiler for GP5, a hardware accelerator for inference.


Bayesian one-mode projection for dynamic bipartite graphs

arXiv.org Machine Learning

We propose a Bayesian methodology for one-mode projecting a bipartite network that is being observed across a series of discrete time steps. The resulting one mode network captures the uncertainty over the presence/absence of each link and provides a probability distribution over its possible weight values. Additionally, the incorporation of prior knowledge over previous states makes the resulting network less sensitive to noise and missing observations that usually take place during the data collection process. The methodology consists of computationally inexpensive update rules and is scalable to large problems, via an appropriate distributed implementation.


Unsupervised Active Learning in Large Domains

arXiv.org Machine Learning

Active learning is a powerful approach to analyzing data effectively. We show that the feasibility of active learning depends crucially on the choice of measure with respect to which the query is being optimized. The standard information gain, for example, does not permit an accurate evaluation with a small committee, a representative subset of the model space. We propose a surrogate measure requiring only a small committee and discuss the properties of this new measure. We devise, in addition, a bootstrap approach for committee selection. The advantages of this approach are illustrated in the context of recovering (regulatory) network models.


Bayesian Network Classifiers in a High Dimensional Framework

arXiv.org Machine Learning

We present a growing dimension asymptotic formalism. The perspective in this paper is classification theory and we show that it can accommodate probabilistic networks classifiers, including naive Bayes model and its augmented version. When represented as a Bayesian network these classifiers have an important advantage: The corresponding discriminant function turns out to be a specialized case of a generalized additive model, which makes it possible to get closed form expressions for the asymptotic misclassification probabilities used here as a measure of classification accuracy. Moreover, in this paper we propose a new quantity for assessing the discriminative power of a set of features which is then used to elaborate the augmented naive Bayes classifier. The result is a weighted form of the augmented naive Bayes that distributes weights among the sets of features according to their discriminative power. We derive the asymptotic distribution of the sample based discriminative power and show that it is seriously overestimated in a high dimensional case. We then apply this result to find the optimal, in a sense of minimum misclassification probability, type of weighting.


Staged Mixture Modelling and Boosting

arXiv.org Machine Learning

In this paper, we introduce and evaluate a data-driven staged mixture modeling technique for building density, regression, and classification models. Our basic approach is to sequentially add components to a finite mixture model using the structural expectation maximization (SEM) algorithm. We show that our technique is qualitatively similar to boosting. This correspondence is a natural byproduct of the fact that we use the SEM algorithm to sequentially fit the mixture model. Finally, in our experimental evaluation, we demonstrate the effectiveness of our approach on a variety of prediction and density estimation tasks using real-world data.


Dimension Correction for Hierarchical Latent Class Models

arXiv.org Machine Learning

Model complexity is an important factor to consider when selecting among graphical models. When all variables are observed, the complexity of a model can be measured by its standard dimension, i.e. the number of independent parameters. When hidden variables are present, however, standard dimension might no longer be appropriate. One should instead use effective dimension (Geiger et al. 1996). This paper is concerned with the computation of effective dimension. First we present an upper bound on the effective dimension of a latent class (LC) model. This bound is tight and its computation is easy. We then consider a generalization of LC models called hierarchical latent class (HLC) models (Zhang 2002). We show that the effective dimension of an HLC model can be obtained from the effective dimensions of some related LC models. We also demonstrate empirically that using effective dimension in place of standard dimension improves the quality of models learned from data.


An Information-Theoretic External Cluster-Validity Measure

arXiv.org Machine Learning

In this paper we propose a measure of clustering quality or accuracy that is appropriate in situations where it is desirable to evaluate a clustering algorithm by somehow comparing the clusters it produces with ``ground truth' consisting of classes assigned to the patterns by manual means or some other means in whose veracity there is confidence. Such measures are refered to as ``external'. Our measure also has the characteristic of allowing clusterings with different numbers of clusters to be compared in a quantitative and principled way. Our evaluation scheme quantitatively measures how useful the cluster labels of the patterns are as predictors of their class labels. In cases where all clusterings to be compared have the same number of clusters, the measure is equivalent to the mutual information between the cluster labels and the class labels. In cases where the numbers of clusters are different, however, it computes the reduction in the number of bits that would be required to encode (compress) the class labels if both the encoder and decoder have free acccess to the cluster labels. To achieve this encoding the estimated conditional probabilities of the class labels given the cluster labels must also be encoded. These estimated probabilities can be seen as a model for the class labels and their associated code length as a model cost.


Learning with Scope, with Application to Information Extraction and Classification

arXiv.org Machine Learning

In probabilistic approaches to classification and information extraction, one typically builds a statistical model of words under the assumption that future data will exhibit the same regularities as the training data. In many data sets, however, there are scope-limited features whose predictive power is only applicable to a certain subset of the data. For example, in information extraction from web pages, word formatting may be indicative of extraction category in different ways on different web pages. The difficulty with using such features is capturing and exploiting the new regularities encountered in previously unseen data. In this paper, we propose a hierarchical probabilistic model that uses both local/scope-limited features, such as word formatting, and global features, such as word content. The local regularities are modeled as an unobserved random parameter which is drawn once for each local data set. This random parameter is estimated during the inference process and then used to perform classification with both the local and global features--- a procedure which is akin to automatically retuning the classifier to the local regularities on each newly encountered web page. Exact inference is intractable and we present approximations via point estimates and variational methods. Empirical results on large collections of web data demonstrate that this method significantly improves performance from traditional models of global features alone.