Goto

Collaborating Authors

 Information Extraction


Mental Health Alerts via Facebook? - The Crux

#artificialintelligence

Every day, 730,000 comments and 420 billion statuses are posted on Facebook, 500 billion 140-character tweets are posted and 430,000 hours of new video is uploaded to YouTube. The Internet is a goldmine of data just waiting to be analyzed. Ever since social media crept deeper and deeper into our daily lives, governments and advertisers have been utilizing this data for myriad purposes. Now, a team of researchers at the University of Ottawa, University of Alberta and the Université de Montpellier in France is examining ways to use social media data to detect and monitor people who are potentially at risk of mental health issues. Using computer algorithms, the team will apply social web mining and "sentiment analysis methods" to troves of data generated through social media to detect at-risk individuals. Sentiment analysis is the process of identifying and categorizing opinions expressed in text through a computer program.


Integrating Stanford CoreNLP with Talend Studio Datalytyx

@machinelearnbot

In my previous blog Twitter Sentiment Analysis using Talend, I showed how to extract tweets from Twitter using Talend and then how to do some basic sentiment analysis on those tweets. In this post, I will introduce the Stanford CoreNLP toolkit and show how to integrate it with Talend to perform various NLP (Natural Language Processing) analyses including sentiment analysis. Previously I had managed to perform some basic sentiment analysis on tweets. However, I'd noticed a major flaw with my technique: the method I was using would take each word in a sentence and average the sentiment score of each word. I explain the issue in more detail in my original post, but to give you a flavour of it, I'll show you some examples of correct/incorrect sentiment identification that would result from my previous method: This is incorrect as it should be fairly obvious that this sentence carries negative sentiment.


Pride and Prejudice and Z-scores

#artificialintelligence

You might think literary criticism is no place for statistical analysis, but given digital versions of the text you can, for example, use sentiment analysis to infer the dramatic arc of an Oscar Wilde novel. Now you can apply similar techniques to the works of Jane Austen thanks to Julia Silge's R package janeaustenr (available on CRAN). The package includes the full text the 6 Austen novels, including Pride and Prejudice and Sense and Sensibility. With the novels' text in hand, Julia then applied Bing sentiment analysis (as implemented in R's syuzhet package), shown here with annotations marking the major dramatic turns in the book: There's quite a lot of noise in that chart, so Julia took the elegant step of using a low-pass fourier transform to smooth the sentiment for all six novels, which allows for a comparison of the dramatic arcs: This is super interesting to me. Emma and Northanger Abbey have the most similar plot trajectories, with their tales of immature women who come to understand their own folly and grow up a bit.


Data Science: The numbers game Law almost lost.

@machinelearnbot

On the face of it, Analytics and Law are manifestly divergent fields of practice. One need only consider the nature of Algorithms that require numerical attributes for their calculations and the textual rigidity of substantive law to realize this. The very first obstacle one will encounter in applying Analytics to Law is the absence of calculable numerical variables in raw legal data. No judicial precedent, statute or common law principle has ever been reduced to a mathematically sound numerical expression; raw legal data is simply not Analytics-receptive. There are however some methods of mining raw legal data, like powerful Text Analytics that make it possible to build reasonably accurate classification, sentiment analysis and many other models.


Text Analysis blog Aylien

#artificialintelligence

As you may be aware, we recently boosted our Text Analysis API offering with a cool new feature, Aspect-Based Sentiment Analysis. The whole idea behind Aspect-Based Sentiment Analysis (ABSA) is to provide a way for our users to extract specific aspects from a piece of text and determine the sentiment towards each aspect individually. We've built models for 4 different domains (industries). You can see the domains and the domain specific aspects listed in the image below. We explain it quickly and simply here to help get you up to speed.


Predicting Eurovision 2016 from Twitter data…

#artificialintelligence

This is 2016 version of the Eurovision prediction. I have explained systematics in quite detailed fashion in the last year post which you can find here. Very shortly, I measured how many tweets have been sent about each song from each country. From this, I estimated amount of votes that each country would give to another. For example, if Germans tweets the most about Polish song, I assume that Germany will give Poland 12 points.


Facebook Data Firms Are Being Awfully Quiet On The 'Trending Topics' Story

International Business Times

Just when you need Big Data, it's nowhere to be found. After Facebook made headlines this week for allegedly meddling with its Trending Topics section, several analytics firms that have provided International Business Times with social media data in the past declined to provide numbers related to the ruckus. The kerfuffle was kicked off by a Gizmodo report alleging the company's Trending Topics section suppresses conservative topics of interest, thanks to the whims of its curators. Within hours of the news, the U.S. Senate Committee on Commerce wrote a letter to CEO Mark Zuckerberg asking representatives of Facebook to travel to Washington for a briefing on its curation guidelines. And Thursday, Facebook released its full guidelines for news selection, showing the extent to which human judgment is part of the process.


[Video] How Machine Learning Amplifies Inequality in Society

#artificialintelligence

In this talk, Mike Williams, Research Engineer at Fast Forward Labs, looks at how supervised machine learning has the potential to amplify power and privilege in society. Using sentiment analysis, he demonstrates how text analytics often favors the voices of men. Mike discusses how bias can inadvertently be introduced into any model, and how to recognize and mitigate these harms.


White paper: Making the business case for text analytics

@machinelearnbot

Unstructured data is the most prevalent form of information on the planet. It exists in our e-mails, surveys, social media accounts, call center logs, etc. With a strong text analytics strategy in place, companies can get critical information from this data to drive better business decisions.


Using sentiment analysis to predict ratings of popular tv series

#artificialintelligence

Unless you've been living under a rock for the last few years, you have probably heard of TV shows such as Breaking Bad, Mad Men, How I Met Your Mother or Game of Thrones. While I generally don't spend a whole lot of time watching TV, I have also undergone some pretty intense binge-watching sessions in the past (they generally coincided with exam periods, which was actually not a coincidence…). As I was watching the epic final season of Breaking Bad, it got me thinking on how TV series compare to one another, and how their ratings evolve over time. I therefore decided to look a bit further into user rating trends of popular TV series (and by popular I mean the ones I know). For this, I simply had to define a quick scraping function in R that retrieves the average IMDB user ratings assigned to each episode of a given series.