Yasavur, Ugan (Florida International University) | Travieso, Jorge (Florida International University) | Lisetti, Christine (Florida International University) | Rishe, Naphtali David (Florida International University)
There is an increasing interest for valence and emotion sensing using a variety of signals. Text, as a communication channel, gathers a substantial amount of interest for recognizing its underlying sentiment (valence or polarity), affect or emotion (e.g. happy, sadness). We consider recognizing the valence of a sentence as a prior task to emotion sensing. In this article, we discuss our approach to classify sentences in terms of emotional valence. Our supervised system performs syntactic and semantic analysis for feature extraction. It processes the interactions between words in sentences by using dependency parse trees, and it can decide the current polarity of named-entities based on on-the-fly topic modeling. We compared 3 rule-based approaches and two supervised approaches (i.e. Naive Bayes and Maximum Entropy). We trained and tested our system using the SemEval-2007 affective text dataset, which contains news headlines extracted from news websites. Our results show that our systems outperform the systems demonstrated in SemEval-2007.
User generated content is extremely valuable for mining market intelligence because it is unsolicited. We study the problem of analyzing users' sentiment and opinion in their blog, message board, etc. posts with respect to topics expressed as a search query. In the scenario we consider the matches of the search query terms are expanded through coreference and meronymy to produce a set of mentions. The mentions are contextually evaluated for sentiment and their scores are aggregated (using a data structure we introduce call the sentiment propagation graph) to produce an aggregate score for the input entity. An extremely crucial part in the contextual evaluation of individual mentions is finding which sentiment expressions are semantically related to (target) which mentions --- this is the focus of our paper. We present an approach where potential target mentions for a sentiment expression are ranked using supervised machine learning (Support Vector Machines) where the main features are the syntactic configurations (typed dependency paths) connecting the sentiment expression and the mention. We have created a large English corpus of product discussions blogs annotated with semantic types of mentions, coreference, meronymy and sentiment targets. The corpus proves that coreference and meronymy are not marginal phenomena but are really central to determining the overall sentiment for the top-level entity. We evaluate a number of techniques for sentiment targeting and present results which we believe push the current state-of-the-art.
Shastri, Lokendra (Infosys Technologies Limited) | Parvathy, Anju G. (Infosys Technologies Limited) | Kumar, Abhishek (Infosys Technologies Limited) | Wesley, John (Infosys Technologies Limited) | Blakrishnan, Rajesh (Infosys Technologies Limited)
Much of the ongoing explosion of digital content is in the form of text. This content is a virtual gold-mine of information that can inform a range of social, governmental, and business decisions. For example, using content available on blogs and social networking sites businesses can find out what its customers are saying about their products and services. In the digital age where customer is king, the business value of ascertaining consumer sentiment cannot be overstated. People express sentiments in myriad ways. At times, they use simple, direct assertions, but most often they use sentences involving comparisons, conjunctions expressing multiple and possibly opposing sentiments about multiple features and entities,and pronominal references whose resolution requires discourse level context. Frequently people use abbreviations, slang, SMSese, idioms and metaphors. Understanding the latter also requires common sense reasoning. In this paper, we present iSEE, a fully implemented sentiment extraction engine, which makes use of statistical methods, classical NLU techniques, common sense reasoning, and probabilistic inference to extract entity and feature specific sentiment from complex sentences and dialog. Most of the components of iSEE are domain independent and the system can be generalized to new domains by simply adding domain relevant lexicons.
Exploratory data analysis is one of the most important parts of any machine learning workflow and Natural Language Processing is no different. But which tools you should choose to explore and visualize text data efficiently? In this article, we will discuss and implement nearly all the major techniques that you can use to understand your text data and give you a complete(ish) tour into Python tools that get the job done. In this article, we will use a million news headlines dataset from Kaggle. Now, we can take a look at the data. The dataset contains only two columns, the published date, and the news heading. For simplicity, I will be exploring the first 10000 rows from this dataset.
This article was originally posted by Shahul ES on the Neptune blog. Exploratory data analysis is one of the most important parts of any machine learning workflow and Natural Language Processing is no different. But which tools you should choose to explore and visualize text data efficiently? In this article, we will discuss and implement nearly all the major techniques that you can use to understand your text data and give you a complete(ish) tour into Python tools that get the job done. In this article, we will use a million news headlines dataset from Kaggle. Now, we can take a look at the data. The dataset contains only two columns, the published date, and the news heading.