Goto

Collaborating Authors

 Discourse & Dialogue


Gaussian Hierarchical Latent Dirichlet Allocation: Bringing Polysemy Back

arXiv.org Machine Learning

Topic models are widely used to discover the latent representation of a set of documents. The two canonical models are latent Dirichlet allocation, and Gaussian latent Dirichlet allocation, where the former uses multinomial distributions over words, and the latter uses multivariate Gaussian distributions over pre-trained word embedding vectors as the latent topic representations, respectively. Compared with latent Dirichlet allocation, Gaussian latent Dirichlet allocation is limited in the sense that it does not capture the polysemy of a word such as ``bank.'' In this paper, we show that Gaussian latent Dirichlet allocation could recover the ability to capture polysemy by introducing a hierarchical structure in the set of topics that the model can use to represent a given document. Our Gaussian hierarchical latent Dirichlet allocation significantly improves polysemy detection compared with Gaussian-based models and provides more parsimonious topic representations compared with hierarchical latent Dirichlet allocation. Our extensive quantitative experiments show that our model also achieves better topic coherence and held-out document predictive accuracy over a wide range of corpus and word embedding vectors.


Gated Mechanism for Attention Based Multimodal Sentiment Analysis

arXiv.org Machine Learning

ABSTRACT different granularities [3, 9] or use a cross interaction block that couple the features from different modalities [10, 6]. It is imperative that all modalities in multimodal interactions and 3. Fusion of unimodal and cross Therefore, to learn better cross modal information, we introduce 1.6% and 1.34% absolute improvement over current state-ofthe-art. Furthermore, to capture long term dependencies across 1. INTRODUCTION These are categorised into three types, 1. Methods that learn the modalities independently and fuse the In our proposed model, we aim to learn the interaction between [3, 4], and 3. Methods that explicitly learn contributions Personal use of this material is permitted. Multimodal sentiment analysis provides an opportunity to 2.1. M T V H T W H T V; W R d d (3) (U 1, U 2,..., U u) for a Text modality can be defined as: Cross attentive representations of Text (C V T R u d) and H T Bi-GRU(U 1, U 2,..., U u) (1) Video (C T V R u d) can be represented as: Subscript T denotes Text modality, A and V represent Audio As much as there is an opportunity to leverage cross modal interactions, representations is employed.


Semi-Supervised Class Discovery

arXiv.org Machine Learning

One promising approach to dealing with datapoints that are outside of the initial training distribution (OOD) is to create new classes that capture similarities in the datapoints previously rejected as uncategorizable. Systems that generate labels can be deployed against an arbitrary amount of data, discovering classification schemes that through training create a higher quality representation of data. We introduce the Dataset Reconstruction Accuracy, a new and important measure of the effectiveness of a model's ability to create labels. We introduce benchmarks against this Dataset Reconstruction metric. We apply a new heuristic, class learnability, for deciding whether a class is worthy of addition to the training dataset. We show that our class discovery system can be successfully applied to vision and language, and we demonstrate the value of semi-supervised learning in automatically discovering novel classes.


Analyzing Customer Support on Social Media - Qualetics Data Machines

#artificialintelligence

The goal of this study is to analyze the queries raised by customers on a particular social media platform by analyzing their interactions with the customer support and provide incisive insights to perform sentiment analysis. We performed exploratory data analysis to extract insights from the data. With Deep Learning tools like NLTK, sentiment analysis was performed to understand the positive, negative, and neutral sentiments of the customers of a brand. Machine Learning was used to identify the frequency of similar text appearances. Deep learning algorithms were used to understand the customer queries and the average time taken by the respective company's social customer support team in addressing the queries.


Sentiment Analysis with the bag-of-words

#artificialintelligence

As a precursor to research about Sentiment Analysis with Text Classifiers (Naive Bayes, Maximum Entropy, SVM), Sentiment Analysis with bag-of-words was done and Positive / Negative Sentiment was detected with an accuracy of 60%. This is when only unigrams are used. This percentage will be much when bigrams or trigrams are used (in a next blog-post). See the results at: part 1: http://tinyurl.com/gnlfqqm


Non-Autoregressive Dialog State Tracking

arXiv.org Artificial Intelligence

Recent efforts in Dialogue State Tracking (DST) for task-oriented dialogues have progressed toward open-vocabulary or generation-based approaches where the models can generate slot value candidates from the dialogue history itself. These approaches have shown good performance gain, especially in complicated dialogue domains with dynamic slot values. However, they fall short in two aspects: (1) they do not allow models to explicitly learn signals across domains and slots to detect potential dependencies among (domain, slot) pairs; and (2) existing models follow auto-regressive approaches which incur high time cost when the dialogue evolves over multiple domains and multiple turns. In this paper, we propose a novel framework of Non-Autoregressive Dialog State Tracking (NADST) which can factor in potential dependencies among domains and slots to optimize the models towards better prediction of dialogue states as a complete set rather than separate slots. In particular, the non-autoregressive nature of our method not only enables decoding in parallel to significantly reduce the latency of DST for real-time dialogue response generation, but also detect dependencies among slots at token level in addition to slot and domain level. Our empirical results show that our model achieves the state-of-the-art joint accuracy across all domains on the MultiWOZ 2.1 corpus, and the latency of our model is an order of magnitude lower than the previous state of the art as the dialogue history extends over time.


Asynchronous Distributed Learning of Topic Models

Neural Information Processing Systems

Distributed learning is a problem of fundamental interest in machine learning and cognitive science. In this paper, we present asynchronous distributed learning algorithms for two well-known unsupervised learning frameworks: Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Processes (HDP). In the proposed approach, the data are distributed across P processors, and processors independently perform Gibbs sampling on their local data and communicate their information in a local asynchronous manner with other processors. We demonstrate that our asynchronous algorithms are able to learn global topic models that are statistically as accurate as those learned by the standard LDA and HDP samplers, but with significant improvements in computation time and memory. We show speedup results on a 730-million-word text corpus using 32 processors, and we provide perplexity results for up to 1500 virtual processors.


Spatial Latent Dirichlet Allocation

Neural Information Processing Systems

In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely appled in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a bag-of-words''. It is also critical to properly design words'' and "documents" when using a language model to solve vision problems. In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structure among visual words that are essential for solving many vision problems. The spatial information is not encoded in the value of visual words but in the design of documents.


Supervised Topic Models

Neural Information Processing Systems

We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. We derive a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict response values for new documents. We test sLDA on two real-world problems: movie ratings predicted from reviews, and web page popularity predicted from text descriptions. We illustrate the benefits of sLDA versus modern regularized regression, as well as versus an unsupervised LDA analysis followed by a separate regression.


Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units

Neural Information Processing Systems

The recent emergence of Graphics Processing Units (GPUs) as general-purpose parallel computing devices provides us with new opportunities to develop scalable learning methods for massive data. In this work, we consider the problem of parallelizing two inference methods on GPUs for latent Dirichlet Allocation (LDA) models, collapsed Gibbs sampling (CGS) and collapsed variational Bayesian (CVB). To address limited memory constraints on GPUs, we propose a novel data partitioning scheme that effectively reduces the memory cost. Furthermore, the partitioning scheme balances the computational cost on each multiprocessor and enables us to easily avoid memory access conflicts. We also use data streaming to handle extremely large datasets. Extensive experiments showed that our parallel inference methods consistently produced LDA models with the same predictive power as sequential training methods did but with 26x speedup for CGS and 196x speedup for CVB on a GPU with 30 multiprocessors; actually the speedup is almost linearly scalable with the number of multiprocessors available.