Information Extraction
Structured Self-Attention Weights Encode Semantics in Sentiment Analysis
Wu, Zhengxuan, Nguyen, Thanh-Son, Ong, Desmond C.
Neural attention, especially the self-attention made popular by the Transformer, has become the workhorse of state-of-the-art natural language processing (NLP) models. Very recent work suggests that the self-attention in the Transformer encodes syntactic information; Here, we show that self-attention scores encode semantics by considering sentiment analysis tasks. In contrast to gradient-based feature attribution methods, we propose a simple and effective Layer-wise Attention Tracing (LAT) method to analyze structured attention weights. We apply our method to Transformer models trained on two tasks that have surface dissimilarities, but share common semantics---sentiment analysis of movie reviews and time-series valence prediction in life story narratives. Across both tasks, words with high aggregated attention weights were rich in emotional semantics, as quantitatively validated by an emotion lexicon labeled by human annotators. Our results show that structured attention weights encode rich semantics in sentiment analysis, and match human interpretations of semantics.
MITA: An Information-Extraction Approach to the Analysis of Free-Form Text in Life Insurance Applications
MetLife processes over 260,000 life insurance applications a year. Underwriting of these applications is labor intensive. Automation is difficult because the applications include many free-form text fields. MetLife's intelligent text analyzer (MITA) uses the information-extraction technique of natural language processing to structure the extensive textual fields on a life insurance application. Knowledge engineering, with the help of underwriters as domain experts, was performed to elicit significant concepts for both medical and occupational textual fields.
Deploying nEmesis: Preventing Foodborne Illness by Data Mining Social Media
Foodborne illness afflicts 48 million people annually in the U.S. alone. Over 128,000 are hospitalized and 3,000 die from the infection. While preventable with proper food safety practices, the traditional restaurant inspection process has limited impact given the predictability and low frequency of inspections, and the dynamic nature of the kitchen environment. Despite this reality, the inspection process has remained largely unchanged for decades. CDC has even identified food safety as one of seven "winnable battles"; however, progress to date has been limited.
A Multi-Task Incremental Learning Framework with Category Name Embedding for Aspect-Category Sentiment Analysis
Dai, Zehui, Peng, Cheng, Chen, Huajie, Ding, Yadong
(T)ACSA tasks, including aspect-category sentiment analysis (ACSA) and targeted aspect-category sentiment analysis (TACSA), aims at identifying sentiment polarity on predefined categories. Incremental learning on new categories is necessary for (T)ACSA real applications. Though current multi-task learning models achieve good performance in (T)ACSA tasks, they suffer from catastrophic forgetting problems in (T)ACSA incremental learning tasks. In this paper, to make multi-task learning feasible for incremental learning, we proposed Category Name Embedding network (CNE-net). We set both encoder and decoder shared among all categories to weaken the catastrophic forgetting problem. Besides the origin input sentence, we applied another input feature, i.e., category name, for task discrimination. Our model achieved state-of-the-art on two (T)ACSA benchmark datasets. Furthermore, we proposed a dataset for (T)ACSA incremental learning and achieved the best performance compared with other strong baselines.
SubjQA: A Dataset for Subjectivity and Review Comprehension
Bjerva, Johannes, Bhutani, Nikita, Golshan, Behzad, Tan, Wang-Chiew, Augenstein, Isabelle
Subjectivity is the expression of internal opinions or beliefs which cannot be objectively observed or verified, and has been shown to be important for sentiment analysis and word-sense disambiguation. Furthermore, subjectivity is an important aspect of user-generated data. In spite of this, subjectivity has not been investigated in contexts where such data is widespread, such as in question answering (QA). We therefore investigate the relationship between subjectivity and QA, while developing a new dataset. We compare and contrast with analyses from previous work, and verify that findings regarding subjectivity still hold when using recently developed NLP architectures. We find that subjectivity is also an important feature in the case of QA, albeit with more intricate interactions between subjectivity and QA performance. For instance, a subjective question may or may not be associated with a subjective answer. We release an English QA dataset (SubjQA) based on customer reviews, containing subjectivity annotations for questions and answer spans across 6 distinct domains.
Aspect-Based Sentiment Analysis in Education Domain
Hajrizi, Rinor, Nuçi, Krenare Pireva
Analysis of a large amount of data has always brought value to institutions and organizations. Lately, people's opinions expressed through text have become a very important aspect of this analysis. In response to this challenge, a natural language processing technique known as Aspect-Based Sentiment Analysis (ABSA) has emerged. Having the ability to extract the polarity for each aspect of opinions separately, ABSA has found itself useful in a wide range of domains. Education is one of the domains in which ABSA can be successfully utilized. Being able to understand and find out what students like and don't like most about a course, professor, or teaching methodology can be of great importance for the respective institutions. While this task represents a unique NLP challenge, many studies have proposed different approaches to tackle the problem. In this work, we present a comprehensive review of the existing work in ABSA with a focus in the education domain. A wide range of methodologies are discussed and conclusions are drawn.
Hands-On Guide To Using AutoNLP For Automating Sentiment Analysis
Automated Machine learning or autoML is used for automating the complete process of machine learning for real-world problems to make the process easier and more efficient. Over the years researchers have developed ways of automating processes by developing tools like AutoKeras, AutoSklearn and even no-coding platforms like WEKA and H2o. One such area of automation is in the field of natural language processing. With the development of AutoNLP, it is now super easy to build a model like sentiment analysis with very few basic lines of code and get a good output. With automation like these, it allows everyone to be a part of the machine learning community and does not restrict machine learning to only developers and engineers.
Legal Sentiment Analysis and Opinion Mining (LSAOM): Assimilating Advances in Autonomous AI Legal Reasoning
An expanding field of substantive interest for the theory of the law and the practice-of-law entails Legal Sentiment Analysis and Opinion Mining (LSAOM), consisting of two often intertwined phenomena and actions underlying legal discussions and narratives: (1) Sentiment Analysis (SA) for the detection of expressed or implied sentiment about a legal matter within the context of a legal milieu, and (2) Opinion Mining (OM) for the identification and illumination of explicit or implicit opinion accompaniments immersed within legal discourse. Efforts to undertake LSAOM have historically been performed by human hand and cognition, and only thinly aided in more recent times by the use of computer-based approaches. Advances in Artificial Intelligence (AI) involving especially Natural Language Processing (NLP) and Machine Learning (ML) are increasingly bolstering how automation can systematically perform either or both of Sentiment Analysis and Opinion Mining, all of which is being inexorably carried over into engagement within a legal context for improving LSAOM capabilities. This research paper examines the evolving infusion of AI into Legal Sentiment Analysis and Opinion Mining and proposes an alignment with the Levels of Autonomy (LoA) of AI Legal Reasoning (AILR), plus provides additional insights regarding AI LSAOM in its mechanizations and potential impact to the study of law and the practicing of law.
The Role of Data Science in Sentiment Analysis
Understanding individuals' feelings are fundamental for organizations since clients can communicate their feelings and sentiments more transparently than ever before. By automatically analyzing customer feedback, from study reactions to social media discussions, brands can listen mindfully to their clients, and tailor products and services to address their issues. Sentiment analysis is a machine learning method that recognizes polarity (for example a positive or negative thought) within the text, whether a whole document, paragraph, sentence, or clause. Marketing is ending up being one of the artworks most disrupted by the digital revolution. A lot to the aversion of customary marketing proponents and maybe to the pleasure of technologists, it is presently a lot about codifying the whole knowledge chain – catching the abundance of digital data, sorting out it, applying algorithms to process it and taking care of back noteworthy decisions to different functions– all in real-time, with end to end automation, and at lightening quick speed.
Gender prediction using limited Twitter Data
Burghoorn, Maaike, de Boer, Maaike H. T., Raaijmakers, Stephan
Transformer models have shown impressive performance on a variety of NLP tasks. Off-the-shelf, pre-trained models can be fine-tuned for specific NLP classification tasks, reducing the need for large amounts of additional training data. However, little research has addressed how much data is required to accurately fine-tune such pre-trained transformer models, and how much data is needed for accurate prediction. This paper explores the usability of BERT (a Transformer model for word embedding) for gender prediction on social media. Forensic applications include detecting gender obfuscation, e.g. males posing as females in chat rooms. A Dutch BERT model is fine-tuned on different samples of a Dutch Twitter dataset labeled for gender, varying in the number of tweets used per person. The results show that finetuning BERT contributes to good gender classification performance (80% F1) when finetuned on only 200 tweets per person. But when using just 20 tweets per person, the performance of our classifier deteriorates non-steeply (to 70% F1). These results show that even with relatively small amounts of data, BERT can be fine-tuned to accurately help predict the gender of Twitter users, and, consequently, that it is possible to determine gender on the basis of just a low volume of tweets. This opens up an operational perspective on the swift detection of gender.