Goto

Collaborating Authors

 Discourse & Dialogue


How Important Is Size? An Investigation of Corpus Size and Meaning in Both Latent Semantic Analysis and Latent Dirichlet Allocation

AAAI Conferences

This study examines how differences in corpus size influence the accuracy of Latent Semantic Analysis (LSA) spaces and Latent Dirichlet Allocation (LDA) spaces in two tasks: a word association task and a vocabulary definition test. Specific optimizations were considered in building each semantic model. Initial results indicate that larger corpora lead to greater accuracy and that LDA probabilistic models, similar to LSA vector spaces, can provide insights into cognitive processing at semantic levels.


An Efficient Deep Neural Architecture for Multilingual Sentiment Analysis in Twitter

AAAI Conferences

Sentiment analysis of tweets is often monolingual and the models provided by machine learning classifiers are usually not applicable across distinct languages. Cross-language sentiment classification usually relies on machine translation strategies in which a source language is translated to the desired target language. Machine translation is costly and the provided results are limited by the quality of the translation that is performed. In this paper, we propose an efficient translation-free deep neural architecture for performing multilingual sentiment analysis of tweets. Our proposed approach benefits from a cost-effective character-based embedding and from optimized convolutions to learn from multiple distinct languages. The resulting model is capable of learning latent features from all languages used during training at once and it does not require any translation process to be performed whatsoever. We empirically evaluate the efficiency and effectiveness of the proposed approach in tweet corpora from four different languages and we show that it presents the best trade-off among four distinct state-of-the-art deep neural architectures for sentiment analysis.



Sentiment Analysis & Predictive Analytics for trading. Avoid this systematic mistake

@machinelearnbot

Many common mistakes can be avoided when testing sentiment data for predictive properties. The term "prediction" is not a legal definition. In assessing the predictive qualities of sentiment data there are no rules for what counts as a signal to be tested for predictive properties with regard to financial assets. However, the method you chose ultimately defines what you mean with the term "prediction". To illustrate the point: Using a more prudent definition of the term, the accuracy in the world's most famous prediction study could have been as low as 47% (7 out of 15) instead of 87% (13 out of 15%). An accuracy rate of 47% would not have produced worldwide media attention and more than 1600 academic citations, in my view.


A Sentiment Analysis System to Improve Teaching and Learning

IEEE Computer

Natural language processing and machine learning can be applied to student feedback to help university administrators and teachers address problematic areas in teaching and learning. The proposed system analyzes student comments from both course surveys and online sources to identify sentiment polarity, the emotions expressed, and satisfaction versus dissatisfaction. A comparison with direct-assessment results demonstrates the system's reliability.


People on Media: Jointly Identifying Credible News and Trustworthy Citizen Journalists in Online Communities

arXiv.org Machine Learning

Media seems to have become more partisan, often providing a biased coverage of news catering to the interest of specific groups. It is therefore essential to identify credible information content that provides an objective narrative of an event. News communities such as digg, reddit, or newstrust offer recommendations, reviews, quality ratings, and further insights on journalistic works. However, there is a complex interaction between different factors in such online communities: fairness and style of reporting, language clarity and objectivity, topical perspectives (like political viewpoint), expertise and bias of community members, and more. This paper presents a model to systematically analyze the different interactions in a news community between users, news, and sources. We develop a probabilistic graphical model that leverages this joint interaction to identify 1) highly credible news articles, 2) trustworthy news sources, and 3) expert users who perform the role of "citizen journalists" in the community. Our method extends CRF models to incorporate real-valued ratings, as some communities have very fine-grained scales that cannot be easily discretized without losing information. To the best of our knowledge, this paper is the first full-fledged analysis of credibility, trust, and expertise in news communities.


Credible Review Detection with Limited Information using Consistency Analysis

arXiv.org Machine Learning

Online reviews provide viewpoints on the strengths and shortcomings of products/services, influencing potential customers' purchasing decisions. However, the proliferation of non-credible reviews -- either fake (promoting/ demoting an item), incompetent (involving irrelevant aspects), or biased -- entails the problem of identifying credible reviews. Prior works involve classifiers harnessing rich information about items/users -- which might not be readily available in several domains -- that provide only limited interpretability as to why a review is deemed non-credible. This paper presents a novel approach to address the above issues. We utilize latent topic models leveraging review texts, item ratings, and timestamps to derive consistency features without relying on item/user histories, unavailable for "long-tail" items/users. We develop models, for computing review credibility scores to provide interpretable evidence for non-credible reviews, that are also transferable to other domains -- addressing the scarcity of labeled data. Experiments on real-world datasets demonstrate improvements over state-of-the-art baselines.


People on Drugs: Credibility of User Statements in Health Communities

arXiv.org Machine Learning

Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity. We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information.


Artificial Intelligence and Machine Learning Are Now Driving Marketing and Customer Engagement Activities

@machinelearnbot

As that "mention" gets pulled into the system, an AI called Natural Language Processing reads the post text and determines its "Sentiment." Sentiment is used to determine if the post is positive, neutral, or negative, (and in some advanced cases, the emotion like "anger" "sadness" or "joy"). Doing this manually for every post that comes in isn't feasible (we see tens of thousands of posts on any given week). AI does this automatically for us, and it can "learn" to improve its NLP Sentiment analysis as more posts pass through it, and as manual adjustments for errors are made. And speaking of Christian's post, he used the "#nofilter" hashtag which can be assigned a "proud" tag since its basically saying "my picture was so good, I didn't need to edit it."