Goto

Collaborating Authors

 Information Extraction


A Survey of Resources and Methods for Natural Language Processing of Serbian Language

arXiv.org Artificial Intelligence

The Serbian language is a Slavic language spoken by over 12 million speakers and well understood by over 15 million people. In the area of natural language processing, it can be considered a low-resourced language. Also, Serbian is considered a high-inflectional language. The combination of many word inflections and low availability of language resources makes natural language processing of Serbian challenging. Nevertheless, over the past three decades, there have been a number of initiatives to develop resources and methods for natural language processing of Serbian, ranging from developing a corpus of free text from books and the internet, annotated corpora for classification and named entity recognition tasks to various methods and models performing these tasks. In this paper, we review the initiatives, resources, methods, and their availability.


Transfer Learning for Low-Resource Sentiment Analysis

arXiv.org Artificial Intelligence

Sentiment analysis is the process of identifying and extracting subjective information from text. Despite the advances to employ cross-lingual approaches in an automatic way, the implementation and evaluation of sentiment analysis systems require language-specific data to consider various sociocultural and linguistic peculiarities. In this paper, the collection and annotation of a dataset are described for sentiment analysis of Central Kurdish. We explore a few classical machine learning and neural network-based techniques for this task. Additionally, we employ an approach in transfer learning to leverage pretrained models for data augmentation. We demonstrate that data augmentation achieves a high F$_1$ score and accuracy despite the difficulty of the task.


Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study

arXiv.org Artificial Intelligence

Recently, ChatGPT has drawn great attention from both the research community and the public. We are particularly curious about whether it can serve as a universal sentiment analyzer. To this end, in this work, we provide a preliminary evaluation of ChatGPT on the understanding of opinions, sentiments, and emotions contained in the text. Specifically, we evaluate it in four settings, including standard evaluation, polarity shift evaluation, open-domain evaluation, and sentiment inference evaluation. The above evaluation involves 18 benchmark datasets and 5 representative sentiment analysis tasks, and we compare ChatGPT with fine-tuned BERT and corresponding state-of-the-art (SOTA) models on end-task. Moreover, we also conduct human evaluation and present some qualitative case studies to gain a deep comprehension of its sentiment analysis capabilities.


STAGE: Span Tagging and Greedy Inference Scheme for Aspect Sentiment Triplet Extraction

arXiv.org Artificial Intelligence

Aspect Sentiment Triplet Extraction (ASTE) has become an emerging task in sentiment analysis research, aiming to extract triplets of the aspect term, its corresponding opinion term, and its associated sentiment polarity from a given sentence. Recently, many neural networks based models with different tagging schemes have been proposed, but almost all of them have their limitations: heavily relying on 1) prior assumption that each word is only associated with a single role (e.g., aspect term, or opinion term, etc. ) and 2) word-level interactions and treating each opinion/aspect as a set of independent words. Hence, they perform poorly on the complex ASTE task, such as a word associated with multiple roles or an aspect/opinion term with multiple words. Hence, we propose a novel approach, Span TAgging and Greedy infErence (STAGE), to extract sentiment triplets in span-level, where each span may consist of multiple words and play different roles simultaneously. To this end, this paper formulates the ASTE task as a multi-class span classification problem. Specifically, STAGE generates more accurate aspect sentiment triplet extractions via exploring span-level information and constraints, which consists of two components, namely, span tagging scheme and greedy inference strategy. The former tag all possible candidate spans based on a newly-defined tagging set. The latter retrieves the aspect/opinion term with the maximum length from the candidate sentiment snippet to output sentiment triplets. Furthermore, we propose a simple but effective model based on the STAGE, which outperforms the state-of-the-arts by a large margin on four widely-used datasets. Moreover, our STAGE can be easily generalized to other pair/triplet extraction tasks, which also demonstrates the superiority of the proposed scheme STAGE.


Co-regularization Based Semi-supervised Domain Adaptation

Neural Information Processing Systems

This paper presents a co-regularization based approach to semi-supervised domain adaptation. Our proposed approach (EA) builds on the notion of augmented space (introduced in EASYADAPT (EA) [1]) and harnesses unlabeled data in target domain to further enable the transfer of information from source to target. This semi-supervised approach to domain adaptation is extremely simple to implement and can be applied as a pre-processing step to any supervised learner. Our theoretical analysis (in terms of Rademacher complexity) of EA and EA show that the hypothesis class of EA has lower complexity (compared to EA) and hence results in tighter generalization bounds. Experimental results on sentiment analysis tasks reinforce our theoretical findings and demonstrate the efficacy of the proposed method when compared to EA as well as a few other baseline approaches.


Boosting Multi-Modal E-commerce Attribute Value Extraction via Unified Learning Scheme and Dynamic Range Minimization

arXiv.org Artificial Intelligence

With the prosperity of e-commerce industry, various modalities, e.g., vision and language, are utilized to describe product items. It is an enormous challenge to understand such diversified data, especially via extracting the attribute-value pairs in text sequences with the aid of helpful image regions. Although a series of previous works have been dedicated to this task, there remain seldomly investigated obstacles that hinder further improvements: 1) Parameters from up-stream single-modal pretraining are inadequately applied, without proper jointly fine-tuning in a down-stream multi-modal task. 2) To select descriptive parts of images, a simple late fusion is widely applied, regardless of priori knowledge that language-related information should be encoded into a common linguistic embedding space by stronger encoders. 3) Due to diversity across products, their attribute sets tend to vary greatly, but current approaches predict with an unnecessary maximal range and lead to more potential false positives. To address these issues, we propose in this paper a novel approach to boost multi-modal e-commerce attribute value extraction via unified learning scheme and dynamic range minimization: 1) Firstly, a unified scheme is designed to jointly train a multi-modal task with pretrained single-modal parameters. 2) Secondly, a text-guided information range minimization method is proposed to adaptively encode descriptive parts of each modality into an identical space with a powerful pretrained linguistic model. 3) Moreover, a prototype-guided attribute range minimization method is proposed to first determine the proper attribute set of the current product, and then select prototypes to guide the prediction of the chosen attributes. Experiments on the popular multi-modal e-commerce benchmarks show that our approach achieves superior performance over the other state-of-the-art techniques.


How to Create a Sentiment Analysis Model From Scratch

#artificialintelligence

Sentiment analysis is a natural language processing (NLP) technique that identifies the attitude behind a text. It is also known as opinion mining. The goal of sentiment analysis is to identify whether a certain text has positive, negative, or neutral sentiment. It is widely used by businesses to automatically classify the sentiment in customer reviews. Analyzing large volumes of reviews helps gain valuable insights into the customers' preferences.


Polarity based Sarcasm Detection using Semigraph

arXiv.org Artificial Intelligence

Sarcasm is an advanced linguistic expression often found on various online platforms. Sarcasm detection is challenging in natural language processing tasks that affect sentiment analysis. This article presents the inventive method of the semigraph, including semigraph construction and sarcasm detection processes. A variation of the semigraph is suggested in the pattern-relatedness of the text document. The proposed method is to obtain the sarcastic and non-sarcastic polarity scores of a document using a semigraph. The sarcastic polarity score represents the possibility that a document will become sarcastic. Sarcasm is detected based on the polarity scoring model. The performance of the proposed model enhances the existing prior art approach to sarcasm detection. In the Amazon product review, the model achieved the accuracy, recall, and f-measure of 0.87, 0.79, and 0.83, respectively.


Negativity Spreads Faster: A Large-Scale Multilingual Twitter Analysis on the Role of Sentiment in Political Communication

arXiv.org Artificial Intelligence

Social media has become extremely influential when it comes to policy making in modern societies, especially in the western world, where platforms such as Twitter allow users to follow politicians, thus making citizens more involved in political discussion. In the same vein, politicians use Twitter to express their opinions, debate among others on current topics and promote their political agendas aiming to influence voter behaviour. In this paper, we attempt to analyse tweets of politicians from three European countries and explore the virality of their tweets. Previous studies have shown that tweets conveying negative sentiment are likely to be retweeted more frequently. By utilising state-of-the-art pre-trained language models, we performed sentiment analysis on hundreds of thousands of tweets collected from members of parliament in Greece, Spain and the United Kingdom, including devolved administrations. We achieved this by systematically exploring and analysing the differences between influential and less popular tweets. Our analysis indicates that politicians' negatively charged tweets spread more widely, especially in more recent times, and highlights interesting differences between political parties as well as between politicians and the general population.


Words that Wound: The Impact of Biased Language on News Sentiment and Stock Market Index

arXiv.org Artificial Intelligence

This study investigates the impact of biased language, specifically 'Words that Wound,' on sentiment analysis in a dataset of 45,379 South Korean daily economic news articles. Using Word2Vec, cosine similarity, and an expanded lexicon, we analyzed the influence of these words on news titles' sentiment scores. Our findings reveal that incorporating biased language significantly amplifies sentiment scores' intensity, particularly negativity. The research examines the effect of heightened negativity in news titles on the KOSPI200 index using linear regression and sentiment analysis. Results indicate that the augmented sentiment lexicon (Sent1000), which includes the top 1,000 negative words with high cosine similarity to 'Crisis,' more effectively captures the impact of news sentiment on the stock market index than the original KNU sentiment lexicon (Sent0). The ARDL model and Impulse Response Function (IRF) analyses disclose that Sent1000 has a stronger and more persistent impact on KOSPI200 compared to Sent0. These findings emphasize the importance of understanding language's role in shaping market dynamics and investor sentiment, particularly the impact of negatively biased language on stock market indices. The study highlights the need for considering context and linguistic nuances when analyzing news content and its potential effects on public opinion and market dynamics.