Information Extraction
Common NLP tasks and techniques. Sentiment analysis
Sentiment analysis, also known as opinion mining, is the process of determining the attitude or emotion of the writer towards a particular topic. Sentiment analysis is used in a wide range of applications, such as market research, brand management, and political analysis. There are several techniques used in sentiment analysis, including lexicon-based methods, which use a predefined list of words and their associated sentiment, and machine learning-based methods, which use algorithms to learn patterns in the data. Text classification is the process of automatically categorizing text into predefined categories or labels. Text classification has a wide range of applications, including spam detection, sentiment analysis, and topic classification.
The use of new technologies to support Public Administration. Sentiment analysis and the case of the app IO
Miracula, Vincenzo, Picone, Antonio
Since 2005, there has been an increasing development of digitization within the public administration that sees the introduction of the use of technology as a privileged tool in the management of administrative activities. The main objective is to promote digitization in administrations in order to achieve greater efficiency in their activities in internal relations, between different administrations, and between the latter and private individuals. The entry of artificial intelligence into public action, however, needs to be accompanied by an adequate regulatory framework to guarantee the rights of those administered. The notion of digital transformation has gained significant attention in the literature[1]. Although approaches to the definition of digital transformation vary[2], most authors suggest that digital transformation involves the use of ICT technology to create fundamentally new capabilities in business, public administration[3] and people's lives[4].
Retrieving Users' Opinions on Social Media with Multimodal Aspect-Based Sentiment Analysis
Anschütz, Miriam, Eder, Tobias, Groh, Georg
People post their opinions and experiences on social media, yielding rich databases of end-users' sentiments. This paper shows to what extent machine learning can analyze and structure these databases. An automated data analysis pipeline is deployed to provide insights into user-generated content for researchers in other domains. First, the domain expert can select an image and a term of interest. Then, the pipeline uses image retrieval to find all images showing similar content and applies aspect-based sentiment analysis to outline users' opinions about the selected term. As part of an interdisciplinary project between architecture and computer science researchers, an empirical study of Hamburg's Elbphilharmonie was conveyed. Therefore, we selected 300 thousand posts with the hashtag \enquote{\texttt{hamburg}} from the platform Flickr. Image retrieval methods generated a subset of slightly more than 1.5 thousand images displaying the Elbphilharmonie. We found that these posts mainly convey a neutral or positive sentiment towards it. With this pipeline, we suggest a new semantic computing method that offers novel insights into end-users opinions, e.g., for architecture domain experts.
Universal Information Extraction as Unified Semantic Matching
Lou, Jie, Lu, Yaojie, Dai, Dai, Jia, Wei, Lin, Hongyu, Han, Xianpei, Sun, Le, Wu, Hua
The challenge of information extraction (IE) lies in the diversity of label schemas and the heterogeneity of structures. Traditional methods require task-specific model design and rely heavily on expensive supervision, making them difficult to generalize to new schemas. In this paper, we decouple IE into two basic abilities, structuring and conceptualizing, which are shared by different tasks and schemas. Based on this paradigm, we propose to universally model various IE tasks with Unified Semantic Matching (USM) framework, which introduces three unified token linking operations to model the abilities of structuring and conceptualizing. In this way, USM can jointly encode schema and input text, uniformly extract substructures in parallel, and controllably decode target structures on demand. Empirical evaluation on 4 IE tasks shows that the proposed method achieves state-of-the-art performance under the supervised experiments and shows strong generalization ability in zero/few-shot transfer settings.
CL-XABSA: Contrastive Learning for Cross-lingual Aspect-based Sentiment Analysis
Lin, Nankai, Fu, Yingwen, Lin, Xiaotian, Yang, Aimin, Jiang, Shengyi
As an extensive research in the field of natural language processing (NLP), aspect-based sentiment analysis (ABSA) is the task of predicting the sentiment expressed in a text relative to the corresponding aspect. Unfortunately, most languages lack sufficient annotation resources, thus more and more recent researchers focus on cross-lingual aspect-based sentiment analysis (XABSA). However, most recent researches only concentrate on cross-lingual data alignment instead of model alignment. To this end, we propose a novel framework, CL-XABSA: Contrastive Learning for Cross-lingual Aspect-Based Sentiment Analysis. Based on contrastive learning, we close the distance between samples with the same label in different semantic spaces, thus achieving a convergence of semantic spaces of different languages. Specifically, we design two contrastive strategies, token level contrastive learning of token embeddings (TL-CTE) and sentiment level contrastive learning of token embeddings (SL-CTE), to regularize the semantic space of source and target language to be more uniform. Since our framework can receive datasets in multiple languages during training, our framework can be adapted not only for XABSA task but also for multilingual aspect-based sentiment analysis (MABSA). To further improve the performance of our model, we perform knowledge distillation technology leveraging data from unlabeled target language. In the distillation XABSA task, we further explore the comparative effectiveness of different data (source dataset, translated dataset, and code-switched dataset). The results demonstrate that the proposed method has a certain improvement in the three tasks of XABSA, distillation XABSA and MABSA. For reproducibility, our code for this paper is available at https://github.com/GKLMIP/CL-XABSA.
MEGAnno: Exploratory Labeling for NLP in Computational Notebooks
Zhang, Dan, Kim, Hannah, Chen, Rafael Li, Kandogan, Eser, Hruschka, Estevam
We present MEGAnno, a novel exploratory annotation framework designed for NLP researchers and practitioners. Unlike existing labeling tools that focus on data labeling only, our framework aims to support a broader, iterative ML workflow including data exploration and model development. With MEGAnno's API, users can programmatically explore the data through sophisticated search and automated suggestion functions and incrementally update task schema as their project evolve. Combined with our widget, the users can interactively sort, filter, and assign labels to multiple items simultaneously in the same notebook where the rest of the NLP project resides. We demonstrate MEGAnno's flexible, exploratory, efficient, and seamless labeling experience through a sentiment analysis use case.
Emotion Recognition from Microblog Managing Emoticon with Text and Classifying using 1D CNN
Habib, Md. Ahsan, Akhand, M. A. H., Kamal, Md. Abdus Samad
Microblog, an online-based broadcast medium, is a widely used forum for people to share their thoughts and opinions. Recently, Emotion Recognition (ER) from microblogs is an inspiring research topic in diverse areas. In the machine learning domain, automatic emotion recognition from microblogs is a challenging task, especially, for better outcomes considering diverse content. Emoticon becomes very common in the text of microblogs as it reinforces the meaning of content. This study proposes an emotion recognition scheme considering both the texts and emoticons from microblog data. Emoticons are considered unique expressions of the users' emotions and can be changed by the proper emotional words. The succession of emoticons appearing in the microblog data is preserved and a 1D Convolutional Neural Network (CNN) is employed for emotion classification. The experimental result shows that the proposed emotion recognition scheme outperforms the other existing methods while tested on Twitter data.
SAIDS: A Novel Approach for Sentiment Analysis Informed of Dialect and Sarcasm
Kaseb, Abdelrahman, Farouk, Mona
Sentiment analysis becomes an essential part of every social network, as it enables decision-makers to know more about users' opinions in almost all life aspects. Despite its importance, there are multiple issues it encounters like the sentiment of the sarcastic text which is one of the main challenges of sentiment analysis. This paper tackles this challenge by introducing a novel system (SAIDS) that predicts the sentiment, sarcasm and dialect of Arabic tweets. SAIDS uses its prediction of sarcasm and dialect as known information to predict the sentiment. It uses MARBERT as a language model to generate sentence embedding, then passes it to the sarcasm and dialect models, and then the outputs of the three models are concatenated and passed to the sentiment analysis model. Multiple system design setups were experimented with and reported. SAIDS was applied to the ArSarcasm-v2 dataset where it outperforms the state-of-the-art model for the sentiment analysis task. By training all tasks together, SAIDS achieves results of 75.98 FPN, 59.09 F1-score and 71.13 F1-score for sentiment analysis, sarcasm detection, and dialect identification respectively. The system design can be used to enhance the performance of any task which is dependent on other tasks.
Fine-Tuning of a Sentiment Analysis Task with Transformers-TensorFlow on Apple M1 Chip
A simple guide to fine-tuning a Transformers DistilBert Model using Tensorflow on Apple M1 Chip for a Sentiment Analysis Task. During the execution of the model.fit() After investigation, I found this solution that works for TF2.6 and forces the GPU as the only device available to run the network Read the CSV file and apply a lambda function to convert labels from text to numbers. Label positive is 1 and label negative is 0. The dataset will be split into training, validation, and testing, according to the percentages of 70, 15, and 15. For this copy-paste tutorial, the distilbert-base-uncased has been used, so the DistilBertTokenizerFast is used to tokenize the dataset, the output is in numpy form.
Is word segmentation necessary for Vietnamese sentiment classification?
Nguyen, Duc-Vu, Nguyen, Ngan Luu-Thuy
To the best of our knowledge, this paper made the first attempt to answer whether word segmentation is necessary for Vietnamese sentiment classification. To do this, we presented five pre-trained monolingual S4- based language models for Vietnamese, including one model without word segmentation, and four models using RDRsegmenter, uitnlp, pyvi, or underthesea toolkits in the pre-processing data phase. According to comprehensive experimental results on two corpora, including the VLSP2016-SA corpus of technical article reviews from the news and social media and the UIT-VSFC corpus of the educational survey, we have two suggestions. Firstly, using traditional classifiers like Naive Bayes or Support Vector Machines, word segmentation maybe not be necessary for the Vietnamese sentiment classification corpus, which comes from the social domain. Secondly, word segmentation is necessary for Vietnamese sentiment classification when word segmentation is used before using the BPE method and feeding into the deep learning model. In this way, the RDRsegmenter is the stable toolkit for word segmentation among the uitnlp, pyvi, and underthesea toolkits.