Goto

Collaborating Authors

 Discourse & Dialogue


A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics

arXiv.org Machine Learning

We propose a new method of estimation in topic models, that is not a variation on the existing simplex finding algorithms, and that estimates the number of topics K from the observed data. We derive new finite sample minimax lower bounds for the estimation of A, as well as new upper bounds for our proposed estimator. We describe the scenarios where our estimator is minimax adaptive. Our finite sample analysis is valid for any number of documents (n), individual document length (N_i), dictionary size (p) and number of topics (K), and both p and K are allowed to increase with n, a situation not handled well by previous analyses. We complement our theoretical results with a detailed simulation study. We illustrate that the new algorithm is faster and more accurate than the current ones, although we start out with a computational and theoretical disadvantage of not knowing the correct number of topics K, while we provide the competing methods with the correct value in our simulations.


A Beginner's Guide on Sentiment Analysis with RNN – Towards Data Science

@machinelearnbot

In order to feed this data into our RNN, all input documents must have the same length. We will limit the maximum review length to max_words by truncating longer reviews and padding shorter reviews with a null value (0). We can accomplish this using the pad_sequences() function in Keras. For now, set max_words to 500. We start building our model architecture in the code cell below.


Semi-supervised and Transfer learning approaches for low resource sentiment classification

arXiv.org Machine Learning

Sentiment classification involves quantifying the affective reaction of a human to a document, media item or an event. Although researchers have investigated several methods to reliably infer sentiment from lexical, speech and body language cues, training a model with a small set of labeled datasets is still a challenge. For instance, in expanding sentiment analysis to new languages and cultures, it may not always be possible to obtain comprehensive labeled datasets. In this paper, we investigate the application of semi-supervised and transfer learning methods to improve performances on low resource sentiment classification tasks. We experiment with extracting dense feature representations, pre-training and manifold regularization in enhancing the performance of sentiment classification systems. Our goal is a coherent implementation of these methods and we evaluate the gains achieved by these methods in matched setting involving training and testing on a single corpus setting as well as two cross corpora settings. In both the cases, our experiments demonstrate that the proposed methods can significantly enhance the model performance against a purely supervised approach, particularly in cases involving a handful of training data.


Topic Modeling and Latent Dirichlet Allocation (LDA) in Python

#artificialintelligence

Topic modeling is a type of statistical modeling for discovering the abstract "topics" that occur in a collection of documents. Latent Dirichlet Allocation (LDA) is an example of topic model and is used to classify text in a document to a particular topic. It builds a topic per document model and words per topic model, modeled as Dirichlet distributions. Here we are going to apply LDA to a set of documents and split them into topics. The data set we'll use is a list of over one million news headlines published over a period of 15 years and can be downloaded from Kaggle.


Psychological State in Text: A Limitation of Sentiment Analysis

arXiv.org Artificial Intelligence

Starting with the idea that sentiment analysis models should be able to predict not only positive or negative but also other psychological states of a person, we implement a sentiment analysis model to investigate the relationship between the model and emotional state. We first examine psychological measurements of 64 participants and ask them to write a book report about a story. After that, we train our sentiment analysis model using crawled movie review data. We finally evaluate participants' writings, using the pretrained model as a concept of transfer learning. The result shows that sentiment analysis model performs good at predicting a score, but the score does not have any correlation with human's self-checked sentiment.



Text Mining and Sentiment Analysis - A Primer

@machinelearnbot

Over years, a crucial part of data-gathering behavior has revolved around what other people think. With the constantly growing popularity and availability of opinion-driven resources such as personal blogs and online review sites, new challenges and opportunities are emerging as people have started using advanced technologies to make decisions now. Sentiment analysis or opinion mining, refers to the use of computational linguistics, text analytics and natural language processing to identify and extract information from source materials. Sentiment analysis is considered one of the most popular applications of text analytics. The primary aspect of sentiment analysis includes data analysis on the body of the text for understanding the opinion expressed by it and other key factors comprising modality and mood.


Artificial intelligence: Do it your way - SD Times

#artificialintelligence

More often than not, the best initial use case for AI won't be the company's biggest problem. Making AI real means going beyond the hype, focusing on what is doable in a defined timeframe, with the budget, resources and data that are available. By doing this, firms often discover a more specific use case than they initially considered. For example, instead of trying to improve prediction of customer demand overall, they start with sentiment analysis on social media to establish a better customer dialogue process. It doesn't matter how big the initial use case is.


Fully Statistical Neural Belief Tracking

arXiv.org Artificial Intelligence

This paper proposes an improvement to the existing data-driven Neural Belief Tracking (NBT) framework for Dialogue State Tracking (DST). The existing NBT model uses a hand-crafted belief state update mechanism which involves an expensive manual retuning step whenever the model is deployed to a new dialogue domain. We show that this update mechanism can be learned jointly with the semantic decoding and context modelling parts of the NBT model, eliminating the last rule-based module from this DST framework. We propose two different statistical update mechanisms and show that dialogue dynamics can be modelled with a very small number of additional model parameters. In our DST evaluation over three languages, we show that this model achieves competitive performance and provides a robust framework for building resource-light DST models.


How to Perform Sentiment Analysis in Excel Without Writing Code?

#artificialintelligence

We recently announced a new version of Excel Add-in which lets you perform state-of-the-art text analysis capabilities from the comforts of your spreadsheets without writing a single line of code. The add-in has been received very well by users working across different industry verticals like Market Research, Software, Consumer Goods, Education, etc. solving a variety of use-cases. Sentiment analysis has been the most used function of our Excel add-in closely followed by Emotion detection. Many of our users use sentiment analysis in Excel to quickly and accurately analyze the responses of their open-ended surveys, online chatter around their product/service or to analyze product reviews from e-commerce sites. In this blog post, we will discuss how to use the function Sentiment Analysis in Excel Add-in to do text analytics for any type of content.