Goto

Collaborating Authors

 Information Extraction


Radical-Based Hierarchical Embeddings for Chinese Sentiment Analysis at Sentence Level

AAAI Conferences

Text representation in Chinese sentiment analysis is usually working at word or character level. In this paper, we prove that radical-level processing could greatly improve sentiment classification performance. In particular, we propose two types of Chinese radical-based hierarchical embeddings. The embeddings incorporate not only semantics at radical and character level, but also sentiment information. In the evaluation of our embeddings, we conduct Chinese sentiment analysis at sentence level on four different datasets. Experimental results validate our assumption that radical-level semantics and sentiments can contribute to sentence-level sentiment classification and demonstrate the superiority of our embeddings over classic textual features and popular word and character embeddings.


Can Word Embeddings Help Find Latent Emotions in Text? Preliminary Results

AAAI Conferences

We report results of several experiments evaluating performance of word embeddings on semantic similarity of emotions. Our experiments suggest that the standard embeddings like GloVe and Word2Vec have very limited applicability in identifying emotions in text. Namely, using the standard arithmetic of emotions as a test, we show the mean reciprocal rank of a correct response is about 0.24, that is, combinations of word vectors are not a good proxy for expressed emotions. For example, the sum vector Joy+Fear, contrary to expectations, is not close to the vector representing Guilt. In addition, the opposite emotions, like Pessimism and Delight, have relatively high similarity to each other as word vectors (on average 0.2-0.44). Another experiment shows relatively low similarity (0.2-0.3) of word embeddings for similar emotions, such as Anger and Envy. Thus the standard methods for producing word embeddings are not adequate to represent relationships between emotion words. We conclude with a few hypotheses about improving the accuracy of embeddings in representing emotions.


Unsupervised Extraction of Training Data for Pre-Modern Chinese OCR

AAAI Conferences

Many mainstream OCR techniques involve training a character recognition model using labeled exemplary images of each individual character to be recognized. For modern printed writing, such data can be easily created by automated methods such as rasterizing appropriate font data to produce clean example images. For historical OCR in printing and writing styles distinct from those embodied in modern fonts, appropriate character images must instead be extracted from actual historical documents to achieve good recognition accuracy. For languages with small character sets it may feasible to perform this process manually, but for languages with many thousands of characters, such as Chinese, manually collecting this data is often not practical.


An Efficient Deep Neural Architecture for Multilingual Sentiment Analysis in Twitter

AAAI Conferences

Sentiment analysis of tweets is often monolingual and the models provided by machine learning classifiers are usually not applicable across distinct languages. Cross-language sentiment classification usually relies on machine translation strategies in which a source language is translated to the desired target language. Machine translation is costly and the provided results are limited by the quality of the translation that is performed. In this paper, we propose an efficient translation-free deep neural architecture for performing multilingual sentiment analysis of tweets. Our proposed approach benefits from a cost-effective character-based embedding and from optimized convolutions to learn from multiple distinct languages. The resulting model is capable of learning latent features from all languages used during training at once and it does not require any translation process to be performed whatsoever. We empirically evaluate the efficiency and effectiveness of the proposed approach in tweet corpora from four different languages and we show that it presents the best trade-off among four distinct state-of-the-art deep neural architectures for sentiment analysis.



Mining Twitter Data with Python Part 1: Collecting Data

@machinelearnbot

Twitter is a popular social network where users can share short SMS-like messages called tweets. Users share thoughts, links and pictures on Twitter, journalists comment on live events, companies promote products and engage with customers. The list of different ways to use Twitter could be really long, and with 500 millions of tweets per day, there's a lot of data to analyse and to play with. This is the first in a series of articles dedicated to mining data on Twitter using Python. In this first part, we'll see different options to collect data from Twitter.


Sentiment Analysis & Predictive Analytics for trading. Avoid this systematic mistake

@machinelearnbot

Many common mistakes can be avoided when testing sentiment data for predictive properties. The term "prediction" is not a legal definition. In assessing the predictive qualities of sentiment data there are no rules for what counts as a signal to be tested for predictive properties with regard to financial assets. However, the method you chose ultimately defines what you mean with the term "prediction". To illustrate the point: Using a more prudent definition of the term, the accuracy in the world's most famous prediction study could have been as low as 47% (7 out of 15) instead of 87% (13 out of 15%). An accuracy rate of 47% would not have produced worldwide media attention and more than 1600 academic citations, in my view.


A Sentiment Analysis System to Improve Teaching and Learning

IEEE Computer

Natural language processing and machine learning can be applied to student feedback to help university administrators and teachers address problematic areas in teaching and learning. The proposed system analyzes student comments from both course surveys and online sources to identify sentiment polarity, the emotions expressed, and satisfaction versus dissatisfaction. A comparison with direct-assessment results demonstrates the system's reliability.


People on Drugs: Credibility of User Statements in Health Communities

arXiv.org Machine Learning

Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity. We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information.


Artificial Intelligence and Machine Learning Are Now Driving Marketing and Customer Engagement Activities

@machinelearnbot

As that "mention" gets pulled into the system, an AI called Natural Language Processing reads the post text and determines its "Sentiment." Sentiment is used to determine if the post is positive, neutral, or negative, (and in some advanced cases, the emotion like "anger" "sadness" or "joy"). Doing this manually for every post that comes in isn't feasible (we see tens of thousands of posts on any given week). AI does this automatically for us, and it can "learn" to improve its NLP Sentiment analysis as more posts pass through it, and as manual adjustments for errors are made. And speaking of Christian's post, he used the "#nofilter" hashtag which can be assigned a "proud" tag since its basically saying "my picture was so good, I didn't need to edit it."