AITopics | Discourse & Dialogue

Collaborating Authors

Discourse & Dialogue

Understanding Language in Conversations "The problems addressed in discourse research aim to answer two general kinds of questions: (1) what information is contained in extended sequences of utterances that goes beyond the meaning of the individual utterances themselves? (2) how does the context in which an utterance is used affect the meaning of the individual utterances, or parts of them?"
– Barbara Grosz. Overview of Chapter 6: Discourse and Dialogue, Survey of the State of the Art in Human Language Technology (1996).

News Overviews Instructional Materials AI-Alerts Classics

Correlated Topic Models

Neural Information Processing SystemsApr-6-2023, 15:26:41 GMT

Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a distribution over the vocabulary. A limitation of LDA is the inability to model topic correlation even though, for example, a document about genetics is more likely to also be about disease than x-ray astronomy. This limitation stems from the use of the Dirichlet distribution to model the variability among the topic proportions. In this paper we develop the correlated topic model (CTM), where the topic proportions exhibit correlation via the logistic normal distribution [1].

correlated topic model, lda

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Collapsed Variational Inference for HDP

Neural Information Processing SystemsApr-6-2023, 14:53:00 GMT

A wide variety of Dirichlet-multinomial'topic' models have found interesting ap- plications in recent years. While Gibbs sampling remains an important method of inference in such models, variational techniques have certain advantages such as easy assessment of convergence, easy optimization without the need to maintain detailed balance, a bound on the marginal likelihood, and side-stepping of issues with topic-identifiability. The most accurate variational technique thus far, namely collapsed variational latent Dirichlet allocation, did not deal with model selection nor did it include inference for hyperparameters. We address both issues by gen- eralizing the technique, obtaining the first variational algorithm to deal with the hierarchical Dirichlet process and to deal with hyperparameters of Dirichlet vari- ables. Experiments show a significant improvement in accuracy.

collapsed variational inference, hdp, hyperparameter

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.89)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.68)

Add feedback

Spatial Latent Dirichlet Allocation

Neural Information Processing SystemsApr-6-2023, 14:52:53 GMT

In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely appled in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a bag-of-words''. It is also critical to properly designwords'' and "documents" when using a language model to solve vision problems. In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structure among visual words that are essential for solving many vision problems. The spatial information is not encoded in the value of visual words but in the design of documents.

latent dirichlet allocation, spatial latent dirichlet allocation, visual word, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Supervised Topic Models

Neural Information Processing SystemsApr-6-2023, 14:51:47 GMT

We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. We derive a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict response values for new documents. We test sLDA on two real-world problems: movie ratings predicted from reviews, and web page popularity predicted from text descriptions. We illustrate the benefits of sLDA versus modern regularized regression, as well as versus an unsupervised LDA analysis followed by a separate regression.

regression, slda, supervised topic model

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.78)

Add feedback

Distributed Inference for Latent Dirichlet Allocation

Neural Information Processing SystemsApr-6-2023, 14:37:34 GMT

We investigate the problem of learning a widely-used latent-variable model – the Latent Dirichlet Allocation (LDA) or "topic" model – using distributed compu- of the total data set. We pro- tation, where each of pose two distributed inference schemes that are motivated from different perspec- tives. The first scheme uses local Gibbs sampling on each processor with periodic updates--it is simple to implement and can be viewed as an approximation to a single processor implementation of Gibbs sampling. The second scheme re- lies on a hierarchical Bayesian extension of the standard LDA model to directly processors--it has a theo- account for the fact that data are distributed across retical guarantee of convergence but is more complex to implement than the ap- proximate method. Using five real-world text corpora we show that distributed learning works very well for LDA models, i.e., perplexity and precision-recall scores for distributed learning are indistinguishable from those obtained with single-processor learning.

inference, latent dirichlet allocation, processor, (2 more...)

Neural Information Processing Systems

Country: South America > Paraguay > Asunción > Asunción (0.09)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.88)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.65)

Add feedback

Syntactic Topic Models

Neural Information Processing SystemsApr-6-2023, 14:22:50 GMT

We develop ame\ (STM), a nonparametric Bayesian model of parsed documents. Each word of a sentence is generated by a distribution that combines document-specific topic weights and parse-tree specific syntactic transitions. Words are assumed generated in an order that respects the parse tree. We derive an approximate posterior inference method based on variational methods for hierarchical Dirichlet processes, and we report qualitative and quantitative results on both synthetic data and hand-parsed documents.

parse tree, syntactic topic model

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.12)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.48)

Add feedback

DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification

Neural Information Processing SystemsApr-6-2023, 14:21:52 GMT

Probabilistic topic models (and their extensions) have become popular as models of latent structures in collections of text documents or images. These models are usually treated as generative models and trained using maximum likelihood estimation, an approach which may be suboptimal in the context of an overall classification problem. In this paper, we describe DiscLDA, a discriminative learning framework for such models as Latent Dirichlet Allocation (LDA) in the setting of dimensionality reduction with supervised side information. In DiscLDA, a class-dependent linear transformation is introduced on the topic mixture proportions. This parameter is estimated by maximizing the conditional likelihood using Monte Carlo EM.

Add feedback

Asynchronous Distributed Learning of Topic Models

Neural Information Processing SystemsApr-6-2023, 14:09:14 GMT

Distributed learning is a problem of fundamental interest in machine learning and cognitive science. In this paper, we present asynchronous distributed learning algorithms for two well-known unsupervised learning frameworks: Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Processes (HDP). In the proposed approach, the data are distributed across P processors, and processors independently perform Gibbs sampling on their local data and communicate their information in a local asynchronous manner with other processors. We demonstrate that our asynchronous algorithms are able to learn global topic models that are statistically as accurate as those learned by the standard LDA and HDP samplers, but with significant improvements in computation time and memory. We show speedup results on a 730-million-word text corpus using 32 processors, and we provide perplexity results for up to 1500 virtual processors.

asynchronous, processor, topic model, (3 more...)

Neural Information Processing Systems

Country: South America > Paraguay > Asunción > Asunción (0.10)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.65)

Add feedback

Reading Tea Leaves: How Humans Interpret Topic Models

Neural Information Processing SystemsApr-6-2023, 14:08:19 GMT

Probabilistic topic models are a popular tool for the unsupervised analysis of text, providing both a predictive model of future text and a latent topic representation of the corpus. Practitioners typically assume that the latent space is semantically meaningful. It is used to check models, summarize the corpus, and guide exploration of its contents. However, whether the latent space is interpretable is in need of quantitative evaluation. In this paper, we present new quantitative methods for measuring semantic meaning in inferred topics. We back these measures with large-scale user studies, showing that they capture aspects of the model that are undetected by previous measures of model quality based on held-out likelihood.

held-out likelihood, human interpret topic model, latent space, (1 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.11)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.73)
Information Technology > Modeling & Simulation (0.67)

Add feedback

Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units

Neural Information Processing SystemsApr-6-2023, 14:07:19 GMT

The recent emergence of Graphics Processing Units (GPUs) as general-purpose parallel computing devices provides us with new opportunities to develop scalable learning methods for massive data. In this work, we consider the problem of parallelizing two inference methods on GPUs for latent Dirichlet Allocation (LDA) models, collapsed Gibbs sampling (CGS) and collapsed variational Bayesian (CVB). To address limited memory constraints on GPUs, we propose a novel data partitioning scheme that effectively reduces the memory cost. Furthermore, the partitioning scheme balances the computational cost on each multiprocessor and enables us to easily avoid memory access conflicts. We also use data streaming to handle extremely large datasets.

dirichlet allocation, graphic processing unit, latent dirichlet allocation, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.65)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.65)

Add feedback