Discourse & Dialogue
15 Great Blogs Posted in the last 12 Months
This is part of a new series of articles: once or twice a month, we post previous articles that were very popular when first published. These articles are at least 6 month old but no more than 12 month old. The previous digest in this series was posted here a while back. Below is our fourth edition. Top 20 Big Data Experts to Follow (Includes Scoring Algorithm) Text Classification & Sentiment Analysis tutorial / blog Learn Everything about Sentiment Analysis using R 1.5 TB dataset of anonymized user interactions released by Yahoo Fuzzy Matching Algorithms To Help Data Scientists Match Similar Data
The Emotion Journal performs real-time sentiment analysis on your most personal stories
Andrew Greenstein, an app developer from San Francisco, started journaling a few months ago. He tries to write for five minutes every day, but it's challenging to set aside the time. Still, he's read that journaling reduces stress and can help with goal-setting, so he's trying to make it a habit. At the Disrupt London Hackathon, Greenstein and his team built The Emotion Journal, a voice journaling app that performs real-time emotional analysis to detect the user's feelings and chart their emotional state over time. By day, Greenstein is the CEO of SF AppWorks, a digital agency.
Supervised topic models for clinical interpretability
Hughes, Michael C., Elibol, Huseyin Melih, McCoy, Thomas, Perlis, Roy, Doshi-Velez, Finale
Supervised topic models can help clinical researchers find interpretable cooccurence patterns in count data that are relevant for diagnostics. However, standard formulations of supervised Latent Dirichlet Allocation have two problems. First, when documents have many more words than labels, the influence of the labels will be negligible. Second, due to conditional independence assumptions in the graphical model the impact of supervised labels on the learned topic-word probabilities is often minimal, leading to poor predictions on heldout data. We investigate penalized optimization methods for training sLDA that produce interpretable topic-word parameters and useful heldout predictions, using recognition networks to speed-up inference. We report preliminary results on synthetic data and on predicting successful anti-depressant medication given a patient's diagnostic history.
Opinion Mining - Extraction of opinions from free text - Dataconomy
So you report with reasonable accuracies what the sentiment about a particular brand or product is. After publishing this report, your client comes back to you and says "Hey this is good. Now can you tell me ways in which I can convert the negative sentiments into positive sentiments?" โ Sentiment Analysis stops there and we enter the realms of Opinion Mining. Opinion Mining is about having a deeper understanding of the review that was written. Typically, a detailed review will not just have a sentiment attached to it. It will have information and valuable feedback that can literally help to build the next strategy.
Smart Business: automated sentiments analysis on top
The modern world seems really fast and dynamic with a multitude of new products being launched. Marketing agencies are making fortune by monitoring the markets and delivering reports on consumers' opinions. For today, the feedback analysis is a separate area, let's say a growing industry with an array of products and services. And the prices for those services are pretty exorbitant. Without any doubts, there's always an opportunity to start personal volcanic activities on feedback collection and analysis.
Deep Reinforcement Learning for Multi-Domain Dialogue Systems
Cuayรกhuitl, Heriberto, Yu, Seunghak, Williamson, Ashley, Carse, Jacob
Standard deep reinforcement learning methods such as Deep Q-Networks (DQN) for multiple tasks (domains) face scalability problems. We propose a method for multi-domain dialogue policy learning---termed NDQN, and apply it to an information-seeking spoken dialogue system in the domains of restaurants and hotels. Experimental results comparing DQN (baseline) versus NDQN (proposed) using simulations report that our proposed method exhibits better scalability and is promising for optimising the behaviour of multi-domain dialogue systems.
Structural Correspondence Learning for Cross-lingual Sentiment Classification with One-to-many Mappings
Li, Nana, Zhai, Shuangfei, Zhang, Zhongfei, Liu, Boying
Structural correspondence learning (SCL) is an effective method for cross-lingual sentiment classification. This approach uses unlabeled documents along with a word translation oracle to automatically induce task specific, cross-lingual correspondences. It transfers knowledge through identifying important features, i.e., pivot features. For simplicity, however, it assumes that the word translation oracle maps each pivot feature in source language to exactly only one word in target language. This one-to-one mapping between words in different languages is too strict. Also the context is not considered at all. In this paper, we propose a cross-lingual SCL based on distributed representation of words; it can learn meaningful one-to-many mappings for pivot words using large amounts of monolingual data and a small dictionary. We conduct experiments on NLP\&CC 2013 cross-lingual sentiment analysis dataset, employing English as source language, and Chinese as target language. Our method does not rely on the parallel corpora and the experimental results show that our approach is more competitive than the state-of-the-art methods in cross-lingual sentiment classification.
Unraveling a Keras model
Keras is a great library for hands-on on neural networks, and it has a ton of great examples that makes it very easy to create ANNs & DNNs. So easy in fact, that you could even build one without knowing what's going on. I used the CNN model from this Keras blog post to create a simple sentiment analysis model. But to fully understand what I had just done, I had to dig a little deeper. The basic model outlined in the post is using pre-trained word embeddings of the text to train a CNN for sentiment analysis. I have shown it below, with a few minor changes to padding sizes (border_mode'same'), so that the convolution output size stays the same as its input (for simplicity).
Analyzing Vocabulary Intersections of Expert Annotations and Topic Models for Data Practices in Privacy Policies
Liu, Frederick (Carnegie Mellon University) | Wilson, Shomir (University of Cincinnati) | Schaub, Florian (University of Michigan) | Sadeh, Norman (Carnegie Mellon University)
Privacy policies are commonly used to inform users about the data collection and use practices of websites, mobile apps, and other products and services. However, the average Internet user struggles to understand the contents of these documents and generally does not read them. Natural language and machine learning techniques offer the promise of automatically extracting relevant statements from privacy policies to help generate succinct summaries, but current techniques require large amounts of annotated data. The highest quality annotations require law experts, but their efforts do not scale efficiently. In this paper, we present results on bridging the gap between privacy practice categories defined by law experts with topics learned from Non-negative Matrix Factorization (NMF). To do this, we investigate the intersections between vocabulary sets identified as most significant for each category, using a logistic regression model, and vocabulary sets identified by topic modeling. The intersections exhibit strong matches between some categories and topics, although other categories have weaker affinities with topics. Our results show a path forward for applying unsupervised methods to the determination of data practice categories in privacy policy text.