Targeting Sentiment Expressions through Supervised Ranking of Linguistic Configurations

AAAI Conferences

User generated content is extremely valuable for mining market intelligence because it is unsolicited. We study the problem of analyzing users' sentiment and opinion in their blog, message board, etc. posts with respect to topics expressed as a search query.  In the scenario we consider the matches of the search query terms are expanded through coreference and meronymy to produce a set of mentions.  The mentions are contextually evaluated for sentiment and their scores are aggregated (using a data structure we introduce call the sentiment propagation graph) to produce an aggregate score for the input entity.  An extremely crucial part in the contextual evaluation of individual mentions is finding which sentiment expressions are semantically related to (target) which mentions --- this is the focus of our paper.  We present an approach where potential target mentions for a sentiment expression are ranked using supervised machine learning (Support Vector Machines) where the main features are the syntactic configurations (typed dependency paths) connecting the sentiment expression and the mention.  We have created a large English corpus of product discussions blogs annotated with semantic types of mentions, coreference, meronymy and sentiment targets.  The corpus proves that coreference and meronymy are not marginal phenomena but are really central to determining the overall sentiment for the top-level entity.  We evaluate a number of techniques for sentiment targeting and present results which we believe push the current state-of-the-art.


LDA for Text Summarization and Topic Detection - DZone AI

#artificialintelligence

Machine learning clustering techniques are not the only way to extract topics from a text data set. Text mining literature has proposed a number of statistical models, known as probabilistic topic models, to detect topics from an unlabeled set of documents. One of the most popular models is the latent Dirichlet allocation (LDA) algorithm developed by Blei, Ng, and Jordan [i]. LDA is a generative unsupervised probabilistic algorithm that isolates the top K topics in a data set as described by the most relevant N keywords. In other words, the documents in the data set are represented as random mixtures of latent topics, where each topic is characterized by a Dirichlet distribution over a fixed vocabulary.


Pazhayidam George

AAAI Conferences

Electronic discovery is an interesting subproblem of information retrieval in which one identifies documents that are potentially relevant to issues and facts of a legal case from an electronically stored document collection (a corpus). In this paper, we consider representing documents in a topic space using the well-known topic models such as latent Dirichlet allocation and latent semantic indexing, and solving the information retrieval problem via finding document similarities in the topic space rather doing it in the corpus vocabulary space. We also develop an iterative SMART ranking and categorization framework including human-in-the-loop to label a set of seed (training) documents and using them to build a semi-supervised binary document classification model based on Support Vector Machines. To improve this model, we propose a method for choosing seed documents from the whole population via an active learning strategy. We report the results of our experiments on a real dataset in the electronic discovery domain.


Deeply Moving: Deep Learning for Sentiment Analysis

@machinelearnbot

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank Semantic word spaces have been very useful but cannot express the meaning of longer phrases in a principled way. Further progress towards understanding compositionality in tasks such as sentiment detection requires richer supervised training and evaluation resources and more powerful models of composition. To remedy this, we introduce a Sentiment Treebank. It includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality. To address them, we introduce the Recursive Neural Tensor Network.


Model Deployment for Data Scientists Using TensorFlow: Part 1 - Nightfall AI

#artificialintelligence

In the world of machine learning, model deployment is a crucial piece of the puzzle. While data scientists excel at other parts of the pipeline, deploying machine learning models tends to fall under the umbrella of software engineering or IT operations. And for good reason--successful deployments require a myriad of complex tasks, including building infrastructure, implementing APIs, load balancing, and integrating with data pipelines. We'll briefly walk you through a basic model deployment example by picking out tools and planning out an approach to construct a simple sentiment classification model. By the end of this post you will have the tools to serve your deep learning (DL) models via an API.