Goto

Collaborating Authors

 Information Extraction


Deep Context- and Relation-Aware Learning for Aspect-based Sentiment Analysis

arXiv.org Artificial Intelligence

Existing works for aspect-based sentiment analysis (ABSA) have adopted a unified approach, which allows the interactive relations among subtasks. However, we observe that these methods tend to predict polarities based on the literal meaning of aspect and opinion terms and mainly consider relations implicitly among subtasks at the word level. In addition, identifying multiple aspect-opinion pairs with their polarities is much more challenging. Therefore, a comprehensive understanding of contextual information w.r.t. the aspect and opinion are further required in ABSA. In this paper, we propose Deep Contextualized Relation-Aware Network (DCRAN), which allows interactive relations among subtasks with deep contextual information based on two modules (i.e., Aspect and Opinion Propagation and Explicit Self-Supervised Strategies). Especially, we design novel self-supervised strategies for ABSA, which have strengths in dealing with multiple aspects. Experimental results show that DCRAN significantly outperforms previous state-of-the-art methods by large margins on three widely used benchmarks.


Shareholder activists demand reforms from Amazon, Google, and Facebook โ€ข Data Protection News

#artificialintelligence

Investors and activists are presenting Alphabet, Amazon, Facebook, and Twitter with a list of shareholder resolutions this week that call for investigations into alleged racial bias in Amazon's facial recognition software and other surveillance products, stronger safeguards against the spread of disinformation on Facebook, and the establishment of stronger worker and human rights protections at all four companies. Shareholder advocates and activist allies held a press conference on Monday detailing several resolutions being presented this week and next to the boards of Alphabet, Amazon, Facebook, and Twitter. While the advocates didn't expect the resolutions to pass -- some of the company boards have reportedly already advised shareholders to vote against them -- an Alphabet union representative said her union might organize walkouts if Alphabet doesn't adopt the worker protection and civil and human rights reforms being presented to its board next month.


Facebook to be investigated over whether it is unfairly using personal data to push dating and shopping tools

The Independent - Tech

Regulators have opened an investigation into Facebook amid concerns it is using its vast troves of personal data to push its own shopping and data tools. The probe by the UK's competition regulator will examine whether it is abusing its dominant position in online advertising. It comes amid growing antitrust concerns about the way many technology companies โ€“ not just Facebook but others such as Apple โ€“ have been able to use their vast size and hold on the market to unfairly benefit themselves. The Competition and Markets Authority (CMA) will look into how the social network gathers and uses certain data and whether it may provide an unfair advantage over rivals in the online classified ads and online dating space. As well as Facebook's advertising services, Facebook Login, a feature that allows people to sign into other websites and apps, will also form part of the probe.


T-BERT -- Model for Sentiment Analysis of Micro-blogs Integrating Topic Model and BERT

arXiv.org Artificial Intelligence

Sentiment analysis (SA) has become an extensive research area in recent years impacting diverse fields including ecommerce, consumer business, and politics, driven by increasing adoption and usage of social media platforms. It is challenging to extract topics and sentiments from unsupervised short texts emerging in such contexts, as they may contain figurative words, strident data, and co-existence of many possible meanings for a single word or phrase, all contributing to obtaining incorrect topics. Most prior research is based on a specific theme/rhetoric/focused-content on a clean dataset. In the work reported here, the effectiveness of BERT(Bidirectional Encoder Representations from Transformers) in sentiment classification tasks from a raw live dataset taken from a popular microblogging platform is demonstrated. A novel T-BERT framework is proposed to show the enhanced performance obtainable by combining latent topics with contextual BERT embeddings. Numerical experiments were conducted on an ensemble with about 42000 datasets using NimbleBox.ai platform with a hardware configuration consisting of Nvidia Tesla K80(CUDA), 4 core CPU, 15GB RAM running on an isolated Google Cloud Platform instance. The empirical results show that the model improves in performance while adding topics to BERT and an accuracy rate of 90.81% on sentiment classification using BERT with the proposed approach.


A Span Extraction Approach for Information Extraction on Visually-Rich Documents

arXiv.org Artificial Intelligence

Information extraction (IE) from visually-rich documents (VRDs) has achieved SOTA performance recently thanks to the adaptation of Transformer-based language models, which demonstrates great potential of pre-training methods. In this paper, we present a new approach to improve the capability of language model pre-training on VRDs. Firstly, we introduce a new IE model that is query-based and employs the span extraction formulation instead of the commonly used sequence labelling approach. Secondly, to further extend the span extraction formulation, we propose a new training task which focuses on modelling the relationships between semantic entities within a document. This task enables the spans to be extracted recursively and can be used as both a pre-training objective as well as an IE downstream task. Evaluation on various datasets of popular business documents (invoices, receipts) shows that our proposed method can improve the performance of existing models significantly, while providing a mechanism to accumulate model knowledge from multiple downstream IE tasks.


How to Create and Deploy a Simple Sentiment Analysis App via API - KDnuggets

#artificialintelligence

Let's say you've built an NLP model for some specific task, whether it be text classification, question answering, translation, or what have you. You've tested it out locally and it performs well. You've had others test it out as well, and it continues to perform well. Now you want to roll it out to a larger audience, be that audience a team of developers you work with, a specific group of end users, or even the general public. You have decided that you want to do so using a REST API, as you find this to be your best option.


Validating GAN-BioBERT: A Methodology For Assessing Reporting Trends In Clinical Trials

arXiv.org Machine Learning

In the past decade, there has been much discussion about the issue of biased reporting in clinical research. Despite this attention, there have been limited tools developed for the systematic assessment of qualitative statements made in clinical research, with most studies assessing qualitative statements relying on the use of manual expert raters, which limits their size. Also, previous attempts to develop larger scale tools, such as those using natural language processing, were limited by both their accuracy and the number of categories used for the classification of their findings. With these limitations in mind, this study's goal was to develop a classification algorithm that was both suitably accurate and finely grained to be applied on a large scale for assessing the qualitative sentiment expressed in clinical trial abstracts. Additionally, this study seeks to compare the performance of the proposed algorithm, GAN-BioBERT, to previous studies as well as to expert manual rating of clinical trial abstracts. This study develops a three-class sentiment classification algorithm for clinical trial abstracts using a semi-supervised natural language process model based on the Bidirectional Encoder Representation from Transformers (BERT) model, from a series of clinical trial abstracts annotated by a group of experts in academic medicine. Results: The use of this algorithm was found to have a classification accuracy of 91.3%, with a macro F1-Score of 0.92, which is a significant improvement in accuracy when compared to previous methods and expert ratings, while also making the sentiment classification finer grained than previous studies. The proposed algorithm, GAN-BioBERT, is a suitable classification model for the large-scale assessment of qualitative statements in clinical trial literature, providing an accurate, reproducible tool for the large-scale study of clinical publication trends.


Correcting public opinion trends through Bayesian data assimilation

arXiv.org Artificial Intelligence

Measuring public opinion is a key focus during democratic elections, enabling candidates to gauge their popularity and alter their campaign strategies accordingly. Traditional survey polling remains the most popular estimation technique, despite its cost and time intensity, measurement errors, lack of real-time capabilities and lagged representation of public opinion. In recent years, Twitter opinion mining has attempted to combat these issues. Despite achieving promising results, it experiences its own set of shortcomings such as an unrepresentative sample population and a lack of long term stability. This paper aims to merge data from both these techniques using Bayesian data assimilation to arrive at a more accurate estimate of true public opinion for the Brexit referendum. This paper demonstrates the effectiveness of the proposed approach using Twitter opinion data and survey data from trusted pollsters. Firstly, the possible existence of a time gap of 16 days between the two data sets is identified. This gap is subsequently incorporated into a proposed assimilation architecture. This method was found to adequately incorporate information from both sources and measure a strong upward trend in Leave support leading up to the Brexit referendum. The proposed technique provides useful estimates of true opinion, which is essential to future opinion measurement and forecasting research.


Sentiment analysis in tweets: an assessment study from classical to modern text representation models

arXiv.org Artificial Intelligence

With the growth of social medias, such as Twitter, plenty of user-generated data emerge daily. The short texts published on Twitter -- the tweets -- have earned significant attention as a rich source of information to guide many decision-making processes. However, their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks, including sentiment analysis. Sentiment classification is tackled mainly by machine learning-based classifiers. The literature has adopted word representations from distinct natures to transform tweets to vector-based inputs to feed sentiment classifiers. The representations come from simple count-based methods, such as bag-of-words, to more sophisticated ones, such as BERTweet, built upon the trendy BERT architecture. Nevertheless, most studies mainly focus on evaluating those models using only a small number of datasets. Despite the progress made in recent years in language modelling, there is still a gap regarding a robust evaluation of induced embeddings applied to sentiment analysis on tweets. Furthermore, while fine-tuning the model from downstream tasks is prominent nowadays, less attention has been given to adjustments based on the specific linguistic style of the data. In this context, this study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets from distinct domains and five classification algorithms. The evaluation includes static and contextualized representations. Contexts are assembled from Transformer-based autoencoder models that are also fine-tuned based on the masked language model task, using a plethora of strategies.


Introduction to NLP with Disaster Tweets

#artificialintelligence

Natural Language Processing, also known as NLP, is a subfield of computer science, specifically artificial intelligence, that focuses on understanding written and spoken text. It covers various tasks some of which are speech recognition, sentiment analysis and language generation; And, it has been applied in several use cases such as machine translation, spam detection, virtual assistants and chatbots. The project covered in this article is a sentiment analysis project called Natural Language Processing with Disaster Tweets. Sentiment analysis is the process to extract subjective qualities from text such as emotion or attitude. The objective of the project is to identify if a specific tweet is a real disaster or not. The project is ideal for beginners in NLP.