Information Extraction
Identifying Sentiment Words Using an Optimization Model with L1 Regularization
Deng, Zhi-Hong (Peking University) | Yu, Hongliang (Carnegie Mellon University) | Yang, Yunlun (Peking University)
Sentiment word identification is a fundamental work in numerous applications of sentiment analysis and opinion mining, such as review mining, opinion holder finding, and twitter classification. In this paper, we propose an optimization model with L1 regularization, called ISOMER, for identifying the sentiment words from the corpus. Our model can employ both seed words and documents with sentiment labels, different from most existing researches adopting seed words only. The L1 penalty in the objective function yields a sparse solution since most candidate words have no sentiment. The experiments on the real datasets show that ISOMER outperforms the classic approaches, and that the lexicon learned by ISOMER can be effectively adapted to document-level sentiment analysis.
These U.S. States Like Bacon the Most, Based on Instagram Data
Here's a point that's difficult to argue: Bacon is delicious. The fatty pig product is a favorite food for breakfast or otherwise among many (looking at you, hipsters). Housewares retailer Ginny's recently released an interactive map dubbed the "50 States of Bacon," which uses Instagram data to show which states enjoy bacon the most – and least. According to the map, based in part on an analysis of more than 33,000 Instagram photos in the U.S. featuring the tag #bacon, Hawaii loves it the least, while Nebraskans are the most bacon-obsessed. See your state's amount of bacon love by clicking here.
Work Smarter Not Harder - Teradata Text Analytics
Text analytics is a genre of analytic capabilities intended to function across the typed/written word. This area of analytics seeks to learn from huge quantities of text data to expose human intent, sentiment, and behaviors. Examples include doctor notes, tweets, product/content reviews, survey text response, and much more. There are varying types of text analytics such as text parsing, Levenshtein distance, entity extraction, tagging/classification, and chunking - just to name a few. Many areas of machine learning use text analytics as data preparation steps in order to develop models.
Symbiotic Cognitive Computing through Iteratively Supervised Lexicon Induction
Alba, Alfredo (IBM Research) | Drews, Clemens (IBM Research) | Gruhl, Daniel (IBM Research) | Lewis, Neal (IBM Research) | Mendes, Pablo N. (IBM Research) | Nagarajan, Meenakshi (IBM Research) | Welch, Steve (IBM Research) | Coden, Anni (IBM Research) | Qadir, Ashequl (University of Utah)
In this paper we approach a subset of semantic analysis tasks through a symbiotic cognitive computing approach -- the user and the system learn from each other and accomplish the tasks better than they would do on their own. Our approach starts with a domain expert building a simplified domain model (e.g. semantic lexicons) and annotating documents with that model. The system helps the user by allowing them to obtain quicker results, and by leading them to refine their understanding of the domain. Meanwhile, through the feedback from the user, the system adapts more quickly and produces more accurate results. We believe this virtuous cycle is key for building next generation high quality semantic analysis systems. We present some preliminary findings and discuss our results on four aspects of this virtuous cycle, namely: the intrinsic incompleteness of semantic models, the need for a human in the loop, the benefits of a computer in the loop and finally the overall improvements offered by the human-computer interaction in the process.
Creating a Mars Target Encyclopedia by Extracting Information from the Planetary Science Literature
Wagstaff, Kiri L. (Jet Propulsion Laboratory) | Riloff, Ellen (University of Utah) | Lanza, Nina L. (Los Alamos National Laboratory) | Mattmann, Chris A. (Jet Propulsion Laboratory) | Ramirez, Paul M. (Jet Propulsion Laboratory)
Staying up to date with the latest discoveries is a challenge in any scientific field. In planetary science, new observation targets on the surface of Mars are identified and named every day, and new publications announcing new discoveries and conclusions provide frequent updates about these targets. We are constructing a system that uses information extraction and retrieval methods to mine the steadily growing body of planetary science publications about Mars surface targets and automatically construct a concise summary of what is known about each target. The Mars Target Encyclopedia will provide a central, continually updated resource for use by planetary scientists and the interested public. We describe our use of Tika, Sundance, and AutoSlog to extract and summarize information, some of the challenges associated with this domain, and our plans for maturing the system.
EmoGram: An Open-Source Time Sequence-Based Emotion Tracker and Its Innovative Applications
Joshi, Aditya (Monash Research Academy) | Tripathi, Vaibhav (Indian Institute of Technology Bombay) | Soni, Ravindra (Indian Institute of Technology Bombay) | Bhattacharyya, Pushpak (Indian Institute of Technology Bombay) | Carman, Mark James (Monash University)
In this paper, we present an open-source emotion tracker and its innovative applications. Our tracker, EmoGram, tracks emotion changes for a sequence of textual units. It is versatile in terms of the textual unit (tweets, sentences in discourse, etc.) and also what constitutes the time sequence (timestamps of tweets, discourse nature of text, etc.). We demonstrate the utility of our system through our applications: a sequence of commentaries in cricket matches, a sequence of dialogues in a play, and a sequence of tweets related to the Maggi controversy in India in 2015. That one system can be used for these applications is the merit of EmoGram.
In the mood: the dynamics of collective sentiments on Twitter
Charlton, Nathaniel, Singleton, Colin, Greetham, Danica Vukadinović
We study the relationship between the sentiment levels of Twitter users and the evolving network structure that the users created by @-mentioning each other. We use a large dataset of tweets to which we apply three sentiment scoring algorithms, including the open source SentiStrength program. Specifically we make three contributions. Firstly we find that people who have potentially the largest communication reach (according to a dynamic centrality measure) use sentiment differently than the average user: for example they use positive sentiment more often and negative sentiment less often. Secondly we find that when we follow structurally stable Twitter communities over a period of months, their sentiment levels are also stable, and sudden changes in community sentiment from one day to the next can in most cases be traced to external events affecting the community. Thirdly, based on our findings, we create and calibrate a simple agent-based model that is capable of reproducing measures of emotive response comparable to those obtained from our empirical dataset.
WordStat 7.1: Geospatial Intelligence Meets Text Analytics
Provalis Research announces today the release of a new version of its powerful text analytics software, WordStat 7.1. The software release includes a geographic information system (GIS) mapping and data editing module, allowing businesses to obtain insightful geospatial intelligence. This innovative module provides users with the ability to create a wide range of maps out of pure text data. The analysis of unstructured text data with geographic affinity poses some challenges when an organization is seeking to obtain insightful results. "The implementation of tools currently on the market is a complex process that usually requires in-depth geographic information science (GIS) knowledge," says Normand Péladeau, Provalis Research's CEO.
Chief Technology Officer (CTO)
DigitalMR is an early stage high tech company in the space of market research and marketing. Following 4 years of focussed R&D in A.I. - financed by multiple government grants and self generated cash - we have developed a lot of unique I.P. some of which is patent pending. The main areas of our research are: text analytics - NLP, sentiment & semantic analysis, emotion detection and scoring, automated image theme and sentiment analysis. We work with blue-chip multinationals such as P&G, SABMiller, DIAGEO, Vodafone, Saxo Bank, YPO, Nielsen, TNS, and many more. We are already disrupting a 60 Billion US industry.
PyData Singapore
Synopsis: There is more to Text Mining than TDM and TF-IDF. Come explore the world of Sentiment Analysis using Advanced Text Mining techniques with cutting edge tools like Stanford's CoreNLP and analysing it's output using Python. Speaker: Aditya Shankar is a Lecturer in the Intelligent Systems practice at the Institute of Systems Science in the National University of Singapore. He started his career consulting for Microsoft in Redmond, WA, Nike in Portland, OR and T-Mobile in Seattle, WA. He then moved on to work for companies in the Healthcare domain, mostly healthcare providers in Tennessee.