Goto

Collaborating Authors

 Information Extraction


Creating a sentiment analysis model with Scrapy and MonkeyLearn MonkeyLearn Blog

#artificialintelligence

We are currently in an era of data explosion, where millions of tweets, articles, comments, reviews and the like are being published everyday. Developers are taking advantage of the abundance of data and using things like web scraping to do all kinds of cool things. Sometimes web scraping is not enough; digging deeper and analyzing the data is often needed to unlock the true meaning behind the data and discover valuable insights. On this tutorial we will cover how you can use MonkeyLearn and Scrapy to build a machine learning model that will help you analyze vast amounts of web scraped data in a cost-effective way. We will use Scrapy to extract hotel reviews from TripAdvisor and use those reviews as training samples to create a machine learning model with MonkeyLearn.


The Hidden Cost of Big-Ticket Text Analytics: Time

#artificialintelligence

The inspiration for this week's clip in our "Get the Job Done!" series is the big-ticket procurement and implementation process--and all of those folks whose opinions you don't need. We hear all the time from prospective clients who've found themselves bogged down in the painful, protracted process of getting buy-in for enterprise text analytics platforms that offer something for everyone and come with a six-figure price tag. Oftentimes, this procurement process involves people in the organization who have lots of opinions but no research expertise and who, in cases, won't even be using the purchase in question. Worse yet, after everyone has had his/her say and the purchase has finally gone through, the original intended user finds the whole initiative mired in a lengthy, complicated implementation! It's 2017 and the one thing no one can afford to waste is time.


Sentiment Analysis of Movie Reviews (3): doc2vec

@machinelearnbot

This is the last – for now – installment of my mini-series on sentiment analysis of the Stanford collection of IMDB reviews (originally published on recurrentnull.wordpress.com). So far, we've had a look at classical bag-of-words models and word vectors (word2vec). We saw that from the classifiers used, logistic regression performed best, be it in combination with bag-of-words or word2vec. We also saw that while the word2vec model did in fact model semantic dimensions, it was less successful for classification than bag-of-words, and we explained that by the averaging of word vectors we had to perform to obtain input features on review (not word) level. So the question now is: How would distributed representations perform if we did not have to throw away information by averaging word vectors?


Text Analytics with Python trending on GitHub

#artificialintelligence

Hi all, so I am a big fan of open source and definitely love the GitHub ecosystem which allows us to upload and share excellent software, research and inventions with people all over the world and in turn welcomes them to improve on existing repositories. Github has an excellent feature called "Trending in open source" where you can see trending repositories by language and time. I check it every once in a while to see exciting content being posted by users or as Github puts it, "See what the GitHub community is most excited about today." The book was launched officially last week and is available on all major distribution channels. For more information you can check out my other post.


Gov't requests for Facebook data up 27 percent

FOX News

Governments worldwide requested Facebook users' data nearly 60,000 times in the first half of 2016, a 27 percent increase over requests made in the second half of 2015, according to a Facebook bi-annual report published this week. In addition to government requests for user data, the report details which content Facebook restricts for violating local laws. The company says it studies each request carefully to determine whether or not it has merit, especially in emergency cases where imminent risk of serious injury or harm is involved. It ultimately handed over data in 80 percent of cases. The 27 percent jump for the latest reporting period compares to a 13 percent increase between the first and second halves of 2015, and 18 percent growth between the second half of 2014 and the first half of 2015.


Sentiment Analysis of Movie Reviews (3): doc2vec

@machinelearnbot

This is the last – for now – installment of my mini-series on sentiment analysis of the Stanford collection of IMDB reviews (originally published on recurrentnull.wordpress.com). So far, we've had a look at classical bag-of-words models and word vectors (word2vec). We saw that from the classifiers used, logistic regression performed best, be it in combination with bag-of-words or word2vec. We also saw that while the word2vec model did in fact model semantic dimensions, it was less successful for classification than bag-of-words, and we explained that by the averaging of word vectors we had to perform to obtain input features on review (not word) level. So the question now is: How would distributed representations perform if we did not have to throw away information by averaging word vectors?


Opinion Mining - Sentiment Analysis and Beyond

@machinelearnbot

So you report with reasonable accuracies what the sentiment about a particular brand or product is. After publishing this report, your client comes back to you and says "Hey this is good. Now can you tell me ways in which I can convert the negative sentiments into positive sentiments?" – Sentiment Analysis stops there and we enter the realms of Opinion Mining. Opinion Mining is about having a deeper understanding of the review that was written. Typically, a detailed review will not just have a sentiment attached to it. It will have information and valuable feedback that can literally help to build the next strategy.


Sentiment Analysis of Movie Reviews (1):Bag-of-Words Models

@machinelearnbot

Imagine I show you a book review, on amazon.com, Imagine I hide the number of stars, – all you get to see is the number of stars. And now I'm asking you, that review, is it good or bad? Well, it should be easy, for humans (although depending on the input there can be lots of disagreement between humans, too.) But if you want to do it automatically, it turns out to be surprisingly difficult.


Sentiment Analysis of Movie Reviews (2): word2vec

@machinelearnbot

This is the continuation of my mini-series on sentiment analysis of movie reviews, which originally appeared on recurrentnull.wordpress.com. Last time, we had a look at how well classical bag-of-words models worked for classification of the Stanford collection of IMDB reviews. As it turned out, the "winner" was Logistic Regression, using both unigrams and bigrams for classification. The best classification accuracy obtained was .89 So, bag-of-words models may be surprisingly successful, but they are limited in what they can do.


A machine-learning system that trains itself by surfing the web

#artificialintelligence

Most successful information extraction systems operate with access to a large collection of documents. In this work, we explore the task of acquiring and incorporating external evidence to improve extraction accuracy in domains where the amount of training data is scarce. This process entails issuing search queries, extraction from new sources and reconciliation of extracted values, which are repeated until sufficient evidence is collected. We approach the problem using a reinforcement learning framework where our model learns to select optimal actions based on contextual information. We employ a deep Qnetwork, trained to optimize a reward function that reflects extraction accuracy while penalizing extra effort. Our experiments on two databases – of shooting incidents, and food adulteration cases – demonstrate that our system significantly outperforms traditional extractors and a competitive meta-classifier baseline.