AITopics | Smyth, Padhraic

Asynchronous Distributed Learning of Topic Models

Smyth, Padhraic, Welling, Max, Asuncion, Arthur U.

Neural Information Processing SystemsDec-31-2009

Distributed learning is a problem of fundamental interest in machine learning and cognitive science. In this paper, we present asynchronous distributed learning algorithms for two well-known unsupervised learning frameworks: Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Processes (HDP). In the proposed approach, the data are distributed across P processors, and processors independently perform Gibbs sampling on their local data and communicate their information in a local asynchronous manner with other processors. We demonstrate that our asynchronous algorithms are able to learn global topic models that are statistically as accurate as those learned by the standard LDA and HDP samplers, but with significant improvements in computation time and memory. We show speedup results on a 730-million-word text corpus using 32 processors, and we provide perplexity results for up to 1500 virtual processors. As a stepping stone in the development of asynchronous HDP, a parallel HDP sampler is also introduced.

artificial intelligence, processor, text processing, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.92)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Distributed Inference for Latent Dirichlet Allocation

Newman, David, Smyth, Padhraic, Welling, Max, Asuncion, Arthur U.

Neural Information Processing SystemsDec-31-2008

We investigate the problem of learning a widely-used latent-variable model - the Latent Dirichlet Allocation (LDA) or "topic" model - using distributed computation, whereeach of

artificial intelligence, bayesian inference, processor, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)

Add feedback

Text Modeling using Unsupervised Topic Models and Concept Hierarchies

Chemudugunta, Chaitanya, Smyth, Padhraic, Steyvers, Mark

arXiv.org Artificial IntelligenceAug-7-2008

Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge from large collections of text documents. While topic models can potentially discover a broad range of themes in a data set, the interpretability of the learned topics is not always ideal. Human-defined concepts, on the other hand, tend to be semantically richer due to careful selection of words to define concepts but they tend not to cover the themes in a data set exhaustively. In this paper, we propose a probabilistic framework to combine a hierarchy of human-defined semantic concepts with statistical topic models to seek the best of both worlds. Experimental results using two different sources of concept hierarchies and two collections of text documents indicate that this combination leads to systematic improvements in the quality of the associated language models as well as enabling new techniques for inferring and visualizing the semantics of a document.

artificial intelligence, concept-topic model, text processing, (17 more...)

arXiv.org Artificial Intelligence

0808.0973

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report (0.64)

Industry: Materials > Chemicals (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.71)

Add feedback

Learning Time-Intensity Profiles of Human Activity using Non-Parametric Bayesian Models

Ihler, Alexander T., Smyth, Padhraic

Neural Information Processing SystemsDec-31-2007

Data sets that characterize human activity over time through collections of timestamped eventsor counts are of increasing interest in application areas as humancomputer interaction,video surveillance, and Web data analysis. We propose a nonparametric Bayesian framework for modeling collections of such data. In particular, we use a Dirichlet process framework for learning a set of intensity functions corresponding to different categories, which form a basis set for representing individualtime-periods (e.g., several days) depending on which categories the time-periods are assigned to. This allows the model to learn in a data-driven fashion what "factors" are generating the observations on a particular day, including (forexample) weekday versus weekend effects or day-specific effects corresponding tounique (single-day) occurrences of unusual behavior, sharing information where appropriate to obtain improved estimates of the behavior associated with each category. Applications to real-world data sets of count data involving both vehicles and people are used to illustrate the technique.

Add feedback

Hierarchical Dirichlet Processes with Random Effects

Kim, Seyoung, Smyth, Padhraic

Neural Information Processing SystemsDec-31-2007

In addition, each group is allowed to have its own component parameters coming from a prior described by a template mixture model.

bayesian inference, health & medicine, hierarchical dp, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Orange County > Irvine (0.14)
North America > Canada > Ontario > Toronto (0.14)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model

Chemudugunta, Chaitanya, Smyth, Padhraic, Steyvers, Mark

Neural Information Processing SystemsDec-31-2007

Approaches such as LSI and LDA have both been shown to be useful for "object matching" in their

query, text processing, us government, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report (0.68)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Voting & Elections (0.94)
Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)

Add feedback

Joint Probabilistic Curve Clustering and Alignment

Gaffney, Scott J., Smyth, Padhraic

Neural Information Processing SystemsDec-31-2005

Clustering and prediction of sets of curves is an important problem in many areas of science and engineering. It is often the case that curves tend to be misaligned from each other in a continuous manner, either in space (across the measurements) or in time. We develop a probabilistic framework that allows for joint clustering and continuous alignment of sets of curves in curve space (as opposed to a fixed-dimensional featurevector space). The proposed methodology integrates new probabilistic alignment models with model-based curve clustering algorithms. The probabilistic approach allows for the derivation of consistent EM learning algorithms for the joint clustering-alignment problem. Experimental results are shown for alignment of human growth data, and joint clustering and alignment of gene expression time-course data.

algorithm, artificial intelligence, health & medicine, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California > Orange County > Irvine (0.14)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Data Science (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Joint Probabilistic Curve Clustering and Alignment

Gaffney, Scott J., Smyth, Padhraic

Neural Information Processing SystemsDec-31-2005

Clustering and prediction of sets of curves is an important problem in many areas of science and engineering. It is often the case that curves tend to be misaligned from each other in a continuous manner, either in space (across the measurements) or in time. We develop a probabilistic framework that allows for joint clustering and continuous alignment of sets of curves in curve space (as opposed to a fixed-dimensional featurevector space). The proposed methodology integrates new probabilistic alignment models with model-based curve clustering algorithms. The probabilistic approach allows for the derivation of consistent EM learning algorithms for the joint clustering-alignment problem. Experimental results are shown for alignment of human growth data, and joint clustering and alignment of gene expression time-course data.

algorithm, artificial intelligence, health & medicine, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California > Orange County > Irvine (0.14)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Data Science (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Joint Probabilistic Curve Clustering and Alignment

Gaffney, Scott J., Smyth, Padhraic

Neural Information Processing SystemsDec-31-2005

Clustering and prediction of sets of curves is an important problem in many areas of science and engineering. It is often the case that curves tend to be misaligned from each other in a continuous manner, either in space (across the measurements) or in time. We develop a probabilistic framework that allows for joint clustering and continuous alignment of sets of curves in curve space (as opposed to a fixed-dimensional featurevector space).The proposed methodology integrates new probabilistic alignment models with model-based curve clustering algorithms. The probabilistic approach allows for the derivation of consistent EM learning algorithmsfor the joint clustering-alignment problem. Experimental results are shown for alignment of human growth data, and joint clustering andalignment of gene expression time-course data.

algorithm, artificial intelligence, health & medicine, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California > Orange County > Irvine (0.14)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Data Science (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Gene Expression Clustering with Functional Mixture Models

Chudova, Darya, Hart, Christopher, Mjolsness, Eric, Smyth, Padhraic

Neural Information Processing SystemsDec-31-2004

We propose a functional mixture model for simultaneous clustering and alignment of sets of curves measured on a discrete time grid. The model is specifically tailored to gene expression time course data. Each functional cluster center is a nonlinear combination of solutions of a simple linear differential equation that describes the change of individual mRNA levels when the synthesis and decay rates are constant. The mixture of continuous time parametric functional forms allows one to (a) account for the heterogeneity in the observed profiles, (b) align the profiles in time by estimating real-valued time shifts, (c) capture the synthesis and decay of mRNA in the course of an experiment, and (d) regularize noisy profiles by enforcing smoothness in the mean curves. We derive an EM algorithm for estimating the parameters of the model, and apply the proposed approach to the set of cycling genes in yeast. The experiments show consistent improvement in predictive power and within cluster variance compared to regular Gaussian mixtures.

artificial intelligence, functional form, health & medicine, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California > Orange County > Irvine (0.29)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: