User generated content is extremely valuable for mining market intelligence because it is unsolicited. We study the problem of analyzing users' sentiment and opinion in their blog, message board, etc. posts with respect to topics expressed as a search query. In the scenario we consider the matches of the search query terms are expanded through coreference and meronymy to produce a set of mentions. The mentions are contextually evaluated for sentiment and their scores are aggregated (using a data structure we introduce call the sentiment propagation graph) to produce an aggregate score for the input entity. An extremely crucial part in the contextual evaluation of individual mentions is finding which sentiment expressions are semantically related to (target) which mentions --- this is the focus of our paper. We present an approach where potential target mentions for a sentiment expression are ranked using supervised machine learning (Support Vector Machines) where the main features are the syntactic configurations (typed dependency paths) connecting the sentiment expression and the mention. We have created a large English corpus of product discussions blogs annotated with semantic types of mentions, coreference, meronymy and sentiment targets. The corpus proves that coreference and meronymy are not marginal phenomena but are really central to determining the overall sentiment for the top-level entity. We evaluate a number of techniques for sentiment targeting and present results which we believe push the current state-of-the-art.
As part of the task of automated question answering from a large collection of text documents, the reduction of the search space to a smaller set of document passages that are actually searched for answers constitutes a difficult but rewarding research issue. We propose a set of precision-enhancing filters for passage retrieval based on semantic constraints detected in the submitted questions. The approach improves the performance of the underlying question answering system in terms of both answer accuracy and time performance.
Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman-Yor process mixture models make this assumption, as do all other infinitely exchangeable clustering models. However, for some applications, this assumption is inappropriate. For example, when performing entity resolution, the size of each cluster should be unrelated to the size of the data set, and each cluster should contain a negligible fraction of the total number of data points. These applications require models that yield clusters whose sizes grow sublinearly with the size of the data set. We address this requirement by defining the microclustering property and introducing a new class of models that can exhibit this property. We compare models within this class to two commonly used clustering models using four entity-resolution data sets.
The Investment Arm of a Financial Corporation used Amenity's API to create a dataset on the biggest players in the autonomous car market to support its strategy and business development efforts. Amenity extracted the most critical autonomous car industry news on a daily basis, identifying actionable patterns across the industry over time. The industry analysis model featured custom event types and modified versions of core taxonomies to best identify insights that are meaningful to the autonomous car industry. Amenity was able to acheive a high degree of accuracy using its proprietary NLP API including tokenization, lemmatization, named entity recognition (NER), dependency parsing and semantic role labeling. The result was a comprehensive industry analysis that provided a 360 degree view on the Autonomous Car industry financials, the supply chain, retail, OEM, suppliers, and technological trends.
IBM has teamed up with Local Motors, a Phoenix-based automotive manufacturer that made the first 3D-printed car, to create a self-driving electric bus. Named "Olli," the bus has room for 12 people and uses IBM Watson's cloud-based cognitive computing system to provide information to passengers. In addition to automatically driving you where you want to go using Phoenix Wings autonomous driving technology, Olli can respond to questions and provide information, similar to Amazon's Echo home assistant. The bus debuts today in the Washington D.C. area for the public to use during select times over the next several months, and the IBM-Local Motors team hopes to introduce Olli to the Miami and Las Vegas areas by the end of the year. By using Watson's speech to text, natural language classifier, entity extraction, and text to speech APIs, the bus can provide several services beyond taking you to your destination.