AITopics | Information Extraction

Collaborating Authors

Information Extraction

News Overviews Instructional Materials AI-Alerts Classics

A Survey of Resources and Methods for Natural Language Processing of Serbian Language

Marovac, Ulfeta A., Avdić, Aldina R., Milošević, Nikola Lj.

arXiv.org Artificial IntelligenceApr-11-2023

The Serbian language is a Slavic language spoken by over 12 million speakers and well understood by over 15 million people. In the area of natural language processing, it can be considered a low-resourced language. Also, Serbian is considered a high-inflectional language. The combination of many word inflections and low availability of language resources makes natural language processing of Serbian challenging. Nevertheless, over the past three decades, there have been a number of initiatives to develop resources and methods for natural language processing of Serbian, ranging from developing a corpus of free text from books and the internet, annotated corpora for classification and named entity recognition tasks to various methods and models performing these tasks. In this paper, we review the initiatives, resources, methods, and their availability.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2304.05468

Country:

Europe > Serbia > Central Serbia > Belgrade (0.04)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
Europe > Serbia > Šumadija and Western Serbia > Raška District > Novi Pazar (0.04)
(22 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.45)

Industry:

Government (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.93)
Materials > Metals & Mining (0.92)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

Add feedback

Transfer Learning for Low-Resource Sentiment Analysis

Hameed, Razhan, Ahmadi, Sina, Daneshfar, Fatemeh

arXiv.org Artificial IntelligenceApr-10-2023

Sentiment analysis is the process of identifying and extracting subjective information from text. Despite the advances to employ cross-lingual approaches in an automatic way, the implementation and evaluation of sentiment analysis systems require language-specific data to consider various sociocultural and linguistic peculiarities. In this paper, the collection and annotation of a dataset are described for sentiment analysis of Central Kurdish. We explore a few classical machine learning and neural network-based techniques for this task. Additionally, we employ an approach in transfer learning to leverage pretrained models for data augmentation. We demonstrate that data augmentation achieves a high F$_1$ score and accuracy despite the difficulty of the task.

machine learning, natural language, sentiment analysis, (15 more...)

arXiv.org Artificial Intelligence

2304.04703

Country:

Asia > Middle East (0.28)
Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
(3 more...)

Add feedback

Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study

Wang, Zengzhi, Xie, Qiming, Ding, Zixiang, Feng, Yi, Xia, Rui

arXiv.org Artificial IntelligenceApr-9-2023

Recently, ChatGPT has drawn great attention from both the research community and the public. We are particularly curious about whether it can serve as a universal sentiment analyzer. To this end, in this work, we provide a preliminary evaluation of ChatGPT on the understanding of opinions, sentiments, and emotions contained in the text. Specifically, we evaluate it in four settings, including standard evaluation, polarity shift evaluation, open-domain evaluation, and sentiment inference evaluation. The above evaluation involves 18 benchmark datasets and 5 representative sentiment analysis tasks, and we compare ChatGPT with fine-tuned BERT and corresponding state-of-the-art (SOTA) models on end-task. Moreover, we also conduct human evaluation and present some qualitative case studies to gain a deep comprehension of its sentiment analysis capabilities.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2304.04339

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Washington > King County > Seattle (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(17 more...)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
(2 more...)

Add feedback

STAGE: Span Tagging and Greedy Inference Scheme for Aspect Sentiment Triplet Extraction

Liang, Shuo, Wei, Wei, Mao, Xian-Ling, Fu, Yuanyuan, Fang, Rui, Chen, Dangyang

arXiv.org Artificial IntelligenceApr-9-2023

Aspect Sentiment Triplet Extraction (ASTE) has become an emerging task in sentiment analysis research, aiming to extract triplets of the aspect term, its corresponding opinion term, and its associated sentiment polarity from a given sentence. Recently, many neural networks based models with different tagging schemes have been proposed, but almost all of them have their limitations: heavily relying on 1) prior assumption that each word is only associated with a single role (e.g., aspect term, or opinion term, etc. ) and 2) word-level interactions and treating each opinion/aspect as a set of independent words. Hence, they perform poorly on the complex ASTE task, such as a word associated with multiple roles or an aspect/opinion term with multiple words. Hence, we propose a novel approach, Span TAgging and Greedy infErence (STAGE), to extract sentiment triplets in span-level, where each span may consist of multiple words and play different roles simultaneously. To this end, this paper formulates the ASTE task as a multi-class span classification problem. Specifically, STAGE generates more accurate aspect sentiment triplet extractions via exploring span-level information and constraints, which consists of two components, namely, span tagging scheme and greedy inference strategy. The former tag all possible candidate spans based on a newly-defined tagging set. The latter retrieves the aspect/opinion term with the maximum length from the candidate sentiment snippet to output sentiment triplets. Furthermore, we propose a simple but effective model based on the STAGE, which outperforms the state-of-the-arts by a large margin on four widely-used datasets. Moreover, our STAGE can be easily generalized to other pair/triplet extraction tasks, which also demonstrates the superiority of the proposed scheme STAGE.

computational linguistic, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2211.15003

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > Dominican Republic (0.04)
(9 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.48)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Co-regularization Based Semi-supervised Domain Adaptation

Neural Information Processing SystemsApr-6-2023, 13:23:39 GMT

This paper presents a co-regularization based approach to semi-supervised domain adaptation. Our proposed approach (EA) builds on the notion of augmented space (introduced in EASYADAPT (EA) [1]) and harnesses unlabeled data in target domain to further enable the transfer of information from source to target. This semi-supervised approach to domain adaptation is extremely simple to implement and can be applied as a pre-processing step to any supervised learner. Our theoretical analysis (in terms of Rademacher complexity) of EA and EA show that the hypothesis class of EA has lower complexity (compared to EA) and hence results in tighter generalization bounds. Experimental results on sentiment analysis tasks reinforce our theoretical findings and demonstrate the efficacy of the proposed method when compared to EA as well as a few other baseline approaches.

co-regularization, complexity, semi-supervised domain adaptation

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.87)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.31)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.31)

Add feedback

Boosting Multi-Modal E-commerce Attribute Value Extraction via Unified Learning Scheme and Dynamic Range Minimization

Liu, Mengyin, Zhu, Chao, Gao, Hongyu, Gu, Weibo, Wang, Hongfa, Liu, Wei, Yin, Xu-cheng

arXiv.org Artificial IntelligenceApr-6-2023

With the prosperity of e-commerce industry, various modalities, e.g., vision and language, are utilized to describe product items. It is an enormous challenge to understand such diversified data, especially via extracting the attribute-value pairs in text sequences with the aid of helpful image regions. Although a series of previous works have been dedicated to this task, there remain seldomly investigated obstacles that hinder further improvements: 1) Parameters from up-stream single-modal pretraining are inadequately applied, without proper jointly fine-tuning in a down-stream multi-modal task. 2) To select descriptive parts of images, a simple late fusion is widely applied, regardless of priori knowledge that language-related information should be encoded into a common linguistic embedding space by stronger encoders. 3) Due to diversity across products, their attribute sets tend to vary greatly, but current approaches predict with an unnecessary maximal range and lead to more potential false positives. To address these issues, we propose in this paper a novel approach to boost multi-modal e-commerce attribute value extraction via unified learning scheme and dynamic range minimization: 1) Firstly, a unified scheme is designed to jointly train a multi-modal task with pretrained single-modal parameters. 2) Secondly, a text-guided information range minimization method is proposed to adaptively encode descriptive parts of each modality into an identical space with a powerful pretrained linguistic model. 3) Moreover, a prototype-guided attribute range minimization method is proposed to first determine the proper attribute set of the current product, and then select prototypes to guide the prediction of the chosen attributes. Experiments on the popular multi-modal e-commerce benchmarks show that our approach achieves superior performance over the other state-of-the-art techniques.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2207.07278

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > Promising Solution (0.68)

Industry: Information Technology > Services > e-Commerce Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.64)

Add feedback

How to Create a Sentiment Analysis Model From Scratch

#artificialintelligenceApr-4-2023, 10:07:27 GMT

Sentiment analysis is a natural language processing (NLP) technique that identifies the attitude behind a text. It is also known as opinion mining. The goal of sentiment analysis is to identify whether a certain text has positive, negative, or neutral sentiment. It is widely used by businesses to automatically classify the sentiment in customer reviews. Analyzing large volumes of reviews helps gain valuable insights into the customers' preferences.

dataset, sentiment, training and testing, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Polarity based Sarcasm Detection using Semigraph

Mane, Swapnil, Khatavkar, Vaibhav

arXiv.org Artificial IntelligenceApr-3-2023

Sarcasm is an advanced linguistic expression often found on various online platforms. Sarcasm detection is challenging in natural language processing tasks that affect sentiment analysis. This article presents the inventive method of the semigraph, including semigraph construction and sarcasm detection processes. A variation of the semigraph is suggested in the pattern-relatedness of the text document. The proposed method is to obtain the sarcastic and non-sarcastic polarity scores of a document using a semigraph. The sarcastic polarity score represents the possibility that a document will become sarcastic. Sarcasm is detected based on the polarity scoring model. The performance of the proposed model enhances the existing prior art approach to sarcasm detection. In the Amazon product review, the model achieved the accuracy, recall, and f-measure of 0.87, 0.79, and 0.83, respectively.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2304.01424

Country:

North America > United States > Virginia (0.04)
Asia > Singapore (0.04)
Asia > India > Maharashtra > Pune (0.04)

Genre: Research Report (1.00)

Industry: Retail (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.88)
(2 more...)

Add feedback

Negativity Spreads Faster: A Large-Scale Multilingual Twitter Analysis on the Role of Sentiment in Political Communication

Antypas, Dimosthenis, Preece, Alun, Camacho-Collados, Jose

arXiv.org Artificial IntelligenceApr-3-2023

Social media has become extremely influential when it comes to policy making in modern societies, especially in the western world, where platforms such as Twitter allow users to follow politicians, thus making citizens more involved in political discussion. In the same vein, politicians use Twitter to express their opinions, debate among others on current topics and promote their political agendas aiming to influence voter behaviour. In this paper, we attempt to analyse tweets of politicians from three European countries and explore the virality of their tweets. Previous studies have shown that tweets conveying negative sentiment are likely to be retweeted more frequently. By utilising state-of-the-art pre-trained language models, we performed sentiment analysis on hundreds of thousands of tweets collected from members of parliament in Greece, Spain and the United Kingdom, including devolved administrations. We achieved this by systematically exploring and analysing the differences between influential and less popular tweets. Our analysis indicates that politicians' negatively charged tweets spread more widely, especially in more recent times, and highlights interesting differences between political parties as well as between politicians and the general population.

machine learning, natural language, tweet, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.osnem.2023.100242

2202.00396

Country:

Europe > Greece (0.27)
Europe > United Kingdom > Northern Ireland (0.14)
North America > Haiti (0.14)
(18 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.94)

Industry:

Information Technology > Services (1.00)
Government > Voting & Elections (1.00)
Government > Regional Government (0.93)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Words that Wound: The Impact of Biased Language on News Sentiment and Stock Market Index

Kim, Wonseong

arXiv.org Artificial IntelligenceApr-2-2023

This study investigates the impact of biased language, specifically 'Words that Wound,' on sentiment analysis in a dataset of 45,379 South Korean daily economic news articles. Using Word2Vec, cosine similarity, and an expanded lexicon, we analyzed the influence of these words on news titles' sentiment scores. Our findings reveal that incorporating biased language significantly amplifies sentiment scores' intensity, particularly negativity. The research examines the effect of heightened negativity in news titles on the KOSPI200 index using linear regression and sentiment analysis. Results indicate that the augmented sentiment lexicon (Sent1000), which includes the top 1,000 negative words with high cosine similarity to 'Crisis,' more effectively captures the impact of news sentiment on the stock market index than the original KNU sentiment lexicon (Sent0). The ARDL model and Impulse Response Function (IRF) analyses disclose that Sent1000 has a stronger and more persistent impact on KOSPI200 compared to Sent0. These findings emphasize the importance of understanding language's role in shaping market dynamics and investor sentiment, particularly the impact of negatively biased language on stock market indices. The study highlights the need for considering context and linguistic nuances when analyzing news content and its potential effects on public opinion and market dynamics.

machine learning, natural language, sentiment, (20 more...)

arXiv.org Artificial Intelligence

2304.00468

Country: Asia > South Korea (0.35)

Genre: Research Report > New Finding (1.00)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback