AITopics | Information Extraction

Collaborating Authors

Information Extraction

News Overviews Instructional Materials AI-Alerts Classics

Radical-Based Hierarchical Embeddings for Chinese Sentiment Analysis at Sentence Level

Peng, Haiyun (Nanyang Technological University) | Cambria, Erik (Nanyang Technological University) | Zou, Xiaomei (Harbin Engineering University)

AAAI ConferencesMay-16-2017

Text representation in Chinese sentiment analysis is usually working at word or character level. In this paper, we prove that radical-level processing could greatly improve sentiment classification performance. In particular, we propose two types of Chinese radical-based hierarchical embeddings. The embeddings incorporate not only semantics at radical and character level, but also sentiment information. In the evaluation of our embeddings, we conduct Chinese sentiment analysis at sentence level on four different datasets. Experimental results validate our assumption that radical-level semantics and sentiments can contribute to sentence-level sentiment classification and demonstrate the superiority of our embeddings over classic textual features and popular word and character embeddings.

artificial intelligence, natural language, radical-based hierarchical embedding, (2 more...)

AAAI Conferences

The Thirtieth International Flairs Conference

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Can Word Embeddings Help Find Latent Emotions in Text? Preliminary Results

Seyeditabari, Armin (University of North Carolina at Charlotte) | Zadrozny, Wlodek (University of North Carolina at Charlotte)

AAAI ConferencesMay-16-2017

We report results of several experiments evaluating performance of word embeddings on semantic similarity of emotions. Our experiments suggest that the standard embeddings like GloVe and Word2Vec have very limited applicability in identifying emotions in text. Namely, using the standard arithmetic of emotions as a test, we show the mean reciprocal rank of a correct response is about 0.24, that is, combinations of word vectors are not a good proxy for expressed emotions. For example, the sum vector Joy+Fear, contrary to expectations, is not close to the vector representing Guilt. In addition, the opposite emotions, like Pessimism and Delight, have relatively high similarity to each other as word vectors (on average 0.2-0.44). Another experiment shows relatively low similarity (0.2-0.3) of word embeddings for similar emotions, such as Anger and Envy. Thus the standard methods for producing word embeddings are not adequate to represent relationships between emotion words. We conclude with a few hypotheses about improving the accuracy of embeddings in representing emotions.

artificial intelligence, natural language, word embedding help, (2 more...)

AAAI Conferences

The Thirtieth International Flairs Conference

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.60)

Add feedback

Unsupervised Extraction of Training Data for Pre-Modern Chinese OCR

Sturgeon, Donald (Harvard University)

AAAI ConferencesMay-16-2017

Many mainstream OCR techniques involve training a character recognition model using labeled exemplary images of each individual character to be recognized. For modern printed writing, such data can be easily created by automated methods such as rasterizing appropriate font data to produce clean example images. For historical OCR in printing and writing styles distinct from those embodied in modern fonts, appropriate character images must instead be extracted from actual historical documents to achieve good recognition accuracy. For languages with small character sets it may feasible to perform this process manually, but for languages with many thousands of characters, such as Chinese, manually collecting this data is often not practical.

pre-modern chinese ocr, training data, unsupervised extraction

AAAI Conferences

The Thirtieth International Flairs Conference

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.40)

Add feedback

An Efficient Deep Neural Architecture for Multilingual Sentiment Analysis in Twitter

Becker, Willian (Pontifícia Universidade Católica do Rio Grande do Sul) | Wehrmann, Jônatas (Pontifícia Universidade Católica do Rio Grande do Sul) | Cagnini, Henry E. L. (Pontifícia Universidade Católica do Rio Grande do Sul) | Barros, Rodrigo C. (Pontifícia Universidade Católica do Rio Grande do Sul)

AAAI ConferencesMay-16-2017

Sentiment analysis of tweets is often monolingual and the models provided by machine learning classifiers are usually not applicable across distinct languages. Cross-language sentiment classification usually relies on machine translation strategies in which a source language is translated to the desired target language. Machine translation is costly and the provided results are limited by the quality of the translation that is performed. In this paper, we propose an efficient translation-free deep neural architecture for performing multilingual sentiment analysis of tweets. Our proposed approach benefits from a cost-effective character-based embedding and from optimized convolutions to learn from multiple distinct languages. The resulting model is capable of learning latent features from all languages used during training at once and it does not require any translation process to be performed whatsoever. We empirically evaluate the efficiency and effectiveness of the proposed approach in tweet corpora from four different languages and we show that it presents the best trade-off among four distinct state-of-the-art deep neural architectures for sentiment analysis.

Add feedback

Predicting Movie Ratings: NLP Tools is What Film Studios Need

#artificialintelligenceMay-15-2017, 01:15:09 GMT

She writes about software development, UI and UX, natural language processing, Big Data, AI, and other IT-related topics.

artificial intelligence, natural language, text processing, (17 more...)

#artificialintelligence

Country: North America (0.05)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.70)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.51)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.51)

Add feedback

Mining Twitter Data with Python Part 1: Collecting Data

@machinelearnbotMay-14-2017, 15:17:19 GMT

Twitter is a popular social network where users can share short SMS-like messages called tweets. Users share thoughts, links and pictures on Twitter, journalists comment on live events, companies promote products and engage with customers. The list of different ways to use Twitter could be really long, and with 500 millions of tweets per day, there's a lot of data to analyse and to play with. This is the first in a series of articles dedicated to mining data on Twitter using Python. In this first part, we'll see different options to collect data from Twitter.

artificial intelligence, natural language, social media, (14 more...)

@machinelearnbot

Country: Europe > United Kingdom > England > Greater London > London (0.05)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.53)

Add feedback

Sentiment Analysis & Predictive Analytics for trading. Avoid this systematic mistake

@machinelearnbotMay-13-2017, 16:20:06 GMT

Many common mistakes can be avoided when testing sentiment data for predictive properties. The term "prediction" is not a legal definition. In assessing the predictive qualities of sentiment data there are no rules for what counts as a signal to be tested for predictive properties with regard to financial assets. However, the method you chose ultimately defines what you mean with the term "prediction". To illustrate the point: Using a more prudent definition of the term, the accuracy in the world's most famous prediction study could have been as low as 47% (7 out of 15) instead of 87% (13 out of 15%). An accuracy rate of 47% would not have produced worldwide media attention and more than 1600 academic citations, in my view.

artificial intelligence, data mining, natural language, (17 more...)

@machinelearnbot

Industry: Banking & Finance > Trading (0.52)

Technology:

Information Technology > Data Science > Data Mining (0.78)
Information Technology > Communications > Social Media (0.67)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.40)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.40)

Add feedback

A Sentiment Analysis System to Improve Teaching and Learning

IEEE ComputerMay-10-2017, 21:40:10 GMT

Natural language processing and machine learning can be applied to student feedback to help university administrators and teachers address problematic areas in teaching and learning. The proposed system analyzes student comments from both course surveys and online sources to identify sentiment polarity, the emotions expressed, and satisfaction versus dissatisfaction. A comparison with direct-assessment results demonstrates the system's reliability.

artificial intelligence, natural language, sentiment analysis system, (2 more...)

IEEE Computer

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.58)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.58)

Add feedback

People on Drugs: Credibility of User Statements in Health Communities

Mukherjee, Subhabrata, Weikum, Gerhard, Danescu-Niculescu-Mizil, Cristian

arXiv.org Machine LearningMay-6-2017

Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity. We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information.

data mining, machine learning, natural language, (22 more...)

arXiv.org Machine Learning

1705.02522

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Consumer Health (1.00)
(3 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(5 more...)

Add feedback

Artificial Intelligence and Machine Learning Are Now Driving Marketing and Customer Engagement Activities

@machinelearnbotMay-5-2017, 12:45:44 GMT

As that "mention" gets pulled into the system, an AI called Natural Language Processing reads the post text and determines its "Sentiment." Sentiment is used to determine if the post is positive, neutral, or negative, (and in some advanced cases, the emotion like "anger" "sadness" or "joy"). Doing this manually for every post that comes in isn't feasible (we see tens of thousands of posts on any given week). AI does this automatically for us, and it can "learn" to improve its NLP Sentiment analysis as more posts pass through it, and as manual adjustments for errors are made. And speaking of Christian's post, he used the "#nofilter" hashtag which can be assigned a "proud" tag since its basically saying "my picture was so good, I didn't need to edit it."

artificial intelligence and machine learning, marketing and customer engagement activity, natural language, (2 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.31)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.31)

Add feedback