Goto

Collaborating Authors

 Information Extraction


indico Named Boston's Best Tech Startup at 2nd Annual Timmy Awards

#artificialintelligence

About indico indico provides state-of-the-art machine learning algorithms for text and image analysis in the form of a simple to use web service. This, for the first time, enables companies to automatically extract meaningful insight from unstructured data regardless of their size or capability. Sentiment Analysis, Social Media Monitoring, Content Filtering, Content Classification, Recommendation, and Personalization are just some of the areas in which indico's customers are deploying its technology to improve business outcomes. Furthermore, indico's rapid customization capabilities have also enabled companies such as Mavrck, CO Everywhere, and interlinkONE to quickly develop compelling new solutions that weren't practical before.


Data alignment join in Java for easier text analytics

@machinelearnbot

The join statements of the database can be used conveniently to perform the operation of alignment join. But sometimes the data is stored in the text files, and to compute it in Java alone we need to write a large number of loop statements. This makes the code cumbersome. Using esProc to help with programming in Java can solve the problem easily and quickly. Let's look at how this works through an example.


Gated Neural Networks for Targeted Sentiment Analysis

AAAI Conferences

Targeted sentiment analysis classifies the sentiment polarity towards each target entity mention in given text documents. Seminal methods extract manual discrete features from automatic syntactic parse trees in order to capture semantic information of the enclosing sentence with respect to a target entity mention. Recently, it has been shown that competitive accuracies can be achieved without using syntactic parsers, which can be highly inaccurate on noisy text such as tweets. This is achieved by applying distributed word representations and rich neural pooling functions over a simple and intuitive segmentation of tweets according to target entity mentions. In this paper, we extend this idea by proposing a sentence-level neural model to address the limitation of pooling functions, which do not explicitly model tweet-level semantics. First, a bi-directional gated neural network is used to connect the words in a tweet so that pooling functions can be applied over the hidden layer instead of words for better representing the target and its contexts. Second, a three-way gated neural network structure is used to model the interaction between the target mention and its surrounding contexts. Experiments show that our proposed model gives significantly higher accuracies compared to the current best method for targeted sentiment analysis.


Personalized Microblog Sentiment Classification via Multi-Task Learning

AAAI Conferences

Microblog sentiment classification is an interesting and important research topic with wide applications. Traditional microblog sentiment classification methods usually use a single model to classify the messages from different users and omit individuality. However, microblogging users frequently embed their personal character, opinion bias and language habits into their messages, and the same word may convey different sentiments in messages posted by different users. In this paper, we propose a personalized approach for microblog sentiment classification. In our approach, each user has a personalized sentiment classifier, which is decomposed into two components, a global one and a user-specific one. Our approach can capture the individual personality and at the same time leverage the common sentiment knowledge shared by all users. The personalized sentiment classifiers of massive users are trained in a collaborative way based on multi-task learning to handle the data sparseness problem. In addition, we incorporate users' social relations into our model to strengthen the learning of the personalized models. Moreover, we propose a distributed optimization algorithm to solve our model in parallel. Experiments on two real-world microblog sentiment datasets validate that our approach can improve microblog sentiment classification accuracy effectively and efficiently.


Improving Twitter Sentiment Classification Using Topic-Enriched Multi-Prototype Word Embeddings

AAAI Conferences

It has been shown that learning distributed word representations is highly useful for Twitter sentiment classification.Most existing models rely on a single distributed representation for each word.This is problematic for sentiment classification because words are often polysemous and each word can contain different sentiment polarities under different topics.We address this issue by learning topic-enriched multi-prototype word embeddings (TMWE).In particular, we develop two neural networks which 1) learn word embeddings that better capture tweet context by incorporating topic information, and 2) learn topic-enriched multiple prototype embeddings for each word.Experiments on Twitter sentiment benchmark datasets in SemEval 2013 show that TMWE outperforms the top system with hand-crafted features, and the current best neural network model.


Microsummarization of Online Reviews: An Experimental Study

AAAI Conferences

Mobile and location-based social media applications provide platforms for users to share brief opinions about products, venues, and services. These quickly typed opinions, or microreviews, are a valuable source of current sentiment on a wide variety of subjects. However, there is currently little research on how to mine this information to present it back to users in easily consumable way. In this paper, we introduce the task of microsummarization, which combines sentiment analysis, summarization, and entity recognition in order to surface key content to users. We explore unsupervised and supervised methods for this task, and find we can reliably extract relevant entities and the sentiment targeted towards them using crowdsourced labels as supervision. In an end-to-end evaluation, we find our best-performing system is vastly preferred by judges over a traditional extractive summarization approach. This work motivates an entirely new approach to summarization, incorporating both sentiment analysis and item extraction for modernized, at-a-glance presentation of public opinion.


Acquiring Knowledge of Affective Events from Blogs Using Label Propagation

AAAI Conferences

Many common events in our daily life affect us in positive and negative ways. For example, going on vacation is typically an enjoyable event, while being rushed to the hospital is an undesirable event. In narrative stories and personal conversations, recognizing that some events have a strong affective polarity is essential to understand the discourse and the emotional states of the affected people. However, current NLP systems mainly depend on sentiment analysis tools, which fail to recognize many events that are implicitly affective based on human knowledge about the event itself and cultural norms. Our goal is to automatically acquire knowledge of stereotypically positive and negative events from personal blogs. Our research creates an event context graph from a large collection of blog posts and uses a sentiment classifier and semi-supervised label propagation algorithm to discover affective events. We explore several graph configurations that propagate affective polarity across edges using local context, discourse proximity, and event-event co-occurrence. We then harvest highly affective events from the graph and evaluate the agreement of the polarities with human judgements.


Short Text Representation for Detecting Churn in Microblogs

AAAI Conferences

Churn happens when a customer leaves a brand or stop using its services. Brands reduce their churn rates by identifying and retaining potential churners through customer retention campaigns. In this paper, we consider the problem of classifying micro-posts as churny or non-churny with respect to a given brand. Motivated by the recent success of recurrent neural networks (RNNs) in word representation, we propose to utilize RNNs to learn micro-post and churn indicator representations. We show that such representations improve the performance of churn detection in microblogs and lead to more accurate ranking of churny contents. Furthermore, in this researchwe show that state-of-the-art sentiment analysis approaches fail to identify churny contents. Experiments on Twitter data about three telco brands show the utility of our approach for this task.


Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark

AAAI Conferences

Psychological research results have confirmed that people can have different emotional reactions to different visual stimuli. Several papers have been published on the problem of visual emotion analysis. In particular, attempts have been made to analyze and predict people's emotional reaction towards images. To this end, different kinds of hand-tuned features are proposed. The results reported on several carefully selected and labeled small image data sets have confirmed the promise of such features. While the recent successes of many computer vision related tasks are due to the adoption of Convolutional Neural Networks (CNNs), visual emotion analysis has not achieved the same level of success. This may be primarily due to the unavailability of confidently labeled and relatively large image data sets for visual emotion analysis. In this work, we introduce a new data set, which started from 3+ million weakly labeled images of different emotions and ended up 30 times as large as the current largest publicly available visual emotion data set. We hope that this data set encourages further research on visual emotion analysis. We also perform extensive benchmarking analyses on this large data set using the state of the art methods including CNNs.


Context-Sensitive Twitter Sentiment Classification Using Neural Network

AAAI Conferences

Sentiment classification on Twitter has attracted increasing research in recent years.Most existing work focuses on feature engineering according to the tweet content itself.In this paper, we propose a context-based neural network model for Twitter sentiment analysis, incorporating contextualized features from relevant Tweets into the model in the form of word embedding vectors.Experiments on both balanced and unbalanced datasets show that our proposed models outperform the current state-of-the-art.