Goto

Collaborating Authors

 Discourse & Dialogue


Pride and Prejudice and Z-scores

#artificialintelligence

You might think literary criticism is no place for statistical analysis, but given digital versions of the text you can, for example, use sentiment analysis to infer the dramatic arc of an Oscar Wilde novel. Now you can apply similar techniques to the works of Jane Austen thanks to Julia Silge's R package janeaustenr (available on CRAN). The package includes the full text the 6 Austen novels, including Pride and Prejudice and Sense and Sensibility. With the novels' text in hand, Julia then applied Bing sentiment analysis (as implemented in R's syuzhet package), shown here with annotations marking the major dramatic turns in the book: There's quite a lot of noise in that chart, so Julia took the elegant step of using a low-pass fourier transform to smooth the sentiment for all six novels, which allows for a comparison of the dramatic arcs: This is super interesting to me. Emma and Northanger Abbey have the most similar plot trajectories, with their tales of immature women who come to understand their own folly and grow up a bit.


Text Analysis blog Aylien

#artificialintelligence

As you may be aware, we recently boosted our Text Analysis API offering with a cool new feature, Aspect-Based Sentiment Analysis. The whole idea behind Aspect-Based Sentiment Analysis (ABSA) is to provide a way for our users to extract specific aspects from a piece of text and determine the sentiment towards each aspect individually. We've built models for 4 different domains (industries). You can see the domains and the domain specific aspects listed in the image below. We explain it quickly and simply here to help get you up to speed.


Using sentiment analysis to predict ratings of popular tv series

#artificialintelligence

Unless you've been living under a rock for the last few years, you have probably heard of TV shows such as Breaking Bad, Mad Men, How I Met Your Mother or Game of Thrones. While I generally don't spend a whole lot of time watching TV, I have also undergone some pretty intense binge-watching sessions in the past (they generally coincided with exam periods, which was actually not a coincidenceโ€ฆ). As I was watching the epic final season of Breaking Bad, it got me thinking on how TV series compare to one another, and how their ratings evolve over time. I therefore decided to look a bit further into user rating trends of popular TV series (and by popular I mean the ones I know). For this, I simply had to define a quick scraping function in R that retrieves the average IMDB user ratings assigned to each episode of a given series.


5 Key Challenges in Sentiment Analysis - P Plus Measurement Services

#artificialintelligence

As the adoption of sentiment analysis continues to spread across industries, from politics to PR, opinions about the field also run deep. That's especially true among practitioners, and a range of academic and vendor specialists weighed in at the Sentiment Analysis Symposium in New York last week. While the novelty factor begins to subside, clients are looking for more substance, and as befitting such a multifaceted topic, it's complicated. As a follow-up to yesterday's post that covered the analysis of visual images and facial coding, here the experts offered their perspectives on approaching 5 ongoing issues: The degree of accuracy issue is hard to answer, said Bing Liu, a University of Chicago computer science professor specializing in data mining. It depends on what you're measuring, the level of text you're analyzing, the number of data sets across domains and the voice sound quality of videos, among other variables.


Sentiment Analysis APIs Benchmark MonkeyLearn Blog

#artificialintelligence

Sentiment analysis is a powerful example of how machine learning can help developers build better products with unique features. In short, sentiment analysis is the automated process of understanding if text written in a natural language (English, Spanish, etc.) is positive, neutral, or negative about a given subject. Nowadays, we have many instances where people express opinions and sentiment: tweets, comments, reviews, articles, chats, emails and more. One popular example is Twitter, where real-time opinions from millions of users are expressed constantly. Companies use sentiment analysis on Twitter to discover insights about their products and services.


Creating your first model

#artificialintelligence

Our motive is to create a simple to integrate "Machine Learning" platform but yet powerful enough to provide high accuracy and low latency API. Such a system provides Data Mining, Machine Learning and Artificial Intelligence algorithms as a service. The system has ability to create training model for datasets uploaded as a training set and performs classification on similar datasets in the future using the saved models. "Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials." Download the sample "sentiment analysis" file Sentiment Analysis The first column should always be the label to be predicted.


Sentiment Analysis of 11 Million Tweets from Apple Live 2014 - Going beyond positive and negative

@machinelearnbot

This blog was originally published on our Text Analysis blog, the blog post set out to analyze and visualize 11 million tweets collected around the time of and during Apple Live 2014. Apple Live probably got off to the worst start possible earlier this year. Most of us who tried to log on to watch the much-anticipated launch were first, forced to watch the live feed in Safari and second, greeted with the TV Truck Schedule Screen... To add to this Apple also made a complete mess of the audio. We were left sitting refreshing the page, waiting for the stream to start while being subjected to an audio visual nightmare, described brilliantly by this "fan" below: To simulate the #applelive experience, open up several separate YouTube vids, play them simultaneously, minimize, stare at a test pattern. At AYLIEN, we gathered 11 million tweets mentioning'Apple', 'iPhone', 'iOS', 'iPad', 'Mac', 'iPod', 'Macbook', 'iCloud', 'OS X', 'iWatch' and '#AppleLive' from the 4th of September to the 10th of September with a view of analyzing the tweets to gain insight into the voice of Apple Followers.


Sentiment Classification Using Negation as a Proxy for Negative Sentiment

AAAI Conferences

We explore the relationship between negated text and negative sentiment in the task of sentiment classification. We propose a novel adjustment factor based on negation occurrences as a proxy for negative sentiment that can be applied to lexicon-based classifiers equipped with a negation detection pre-processing step. We performed an experiment on a multi-domain customer reviews dataset obtaining accuracy improvements over a baseline, and we further improved our results using out-of-domain data to calibrate the adjustment factor. We see future work possibilities in exploring negation detection refinements, and expanding the experiment to a broader spectrum of opinionated discourse, beyond that of customer reviews.


Ultradense Word Embeddings by Orthogonal Transformation

arXiv.org Artificial Intelligence

Embeddings are generic representations that are useful for many NLP tasks. In this paper, we introduce DENSIFIER, a method that learns an orthogonal transformation of the embedding space that focuses the information relevant for a task in an ultradense subspace of a dimensionality that is smaller by a factor of 100 than the original space. We show that ultradense embeddings generated by DENSIFIER reach state of the art on a lexicon creation task in which words are annotated with three types of lexical information - sentiment, concreteness and frequency. On the SemEval2015 10B sentiment analysis task we show that no information is lost when the ultradense subspace is used, but training is an order of magnitude more efficient due to the compactness of the ultradense space.


Comparing Approaches for Combining Data Sampling and Feature Selection to Address Key Data Quality Issues in Tweet Sentiment Analysis

AAAI Conferences

When training tweet sentiment classifiers, many data quality challenges must be addressed. One potential issue is class imbalance, where most instances belong to a single majority class. This may negatively impact classifier performance as classifiers trained on imbalanced data may favor classification of new, unseen instances as belonging to the majority class. This issue is accompanied by a second challenge, high-dimesionality, since very large numbers of text based features are used to describe tweet datasets. For datasets where both of these challenges are present, we can combine feature selection and data sampling to address both highdimensionality and class imbalance. However, three potential approaches exist for combining data sampling and feature selection and it is unclear which approach is optimal. In this paper, we seek to determine if there is a best approach for combining data sampling and feature selection. We conduct tests using random undersampling with two post-sampling class ratios (50:50 and 35:65) combined with three feature rankers. Classifiers are trained with each potential combination approach using seven different learners on two datasets. We found that, overall, classifiers trained by performing feature selection followed by data sampling performed better than the other two approaches; however, the differences were only significant for the more imbalanced dataset.