AITopics | Information Extraction

Collaborating Authors

Information Extraction

News Overviews Instructional Materials AI-Alerts Classics

Mirror: A Universal Framework for Various Information Extraction Tasks

Zhu, Tong, Ren, Junfei, Yu, Zijian, Wu, Mengsong, Zhang, Guoliang, Qu, Xiaoye, Chen, Wenliang, Wang, Zhefeng, Huai, Baoxing, Zhang, Min

arXiv.org Artificial IntelligenceNov-26-2023

Sharing knowledge between information extraction tasks has always been a challenge due to the diverse data formats and task variations. Meanwhile, this divergence leads to information waste and increases difficulties in building complex applications in real scenarios. Recent studies often formulate IE tasks as a triplet extraction problem. However, such a paradigm does not support multi-span and n-ary extraction, leading to weak versatility. To this end, we reorganize IE problems into unified multi-slot tuples and propose a universal framework for various IE tasks, namely Mirror. Specifically, we recast existing IE tasks as a multi-span cyclic graph extraction problem and devise a non-autoregressive graph decoding algorithm to extract all spans in a single step. It is worth noting that this graph structure is incredibly versatile, and it supports not only complex IE tasks, but also machine reading comprehension and classification tasks. We manually construct a corpus containing 57 datasets for model pretraining, and conduct experiments on 30 datasets across 8 downstream tasks. The experimental results demonstrate that our model has decent compatibility and outperforms or reaches competitive performance with SOTA systems under few-shot and zero-shot settings. The code, model weights, and pretraining corpus are available at https://github.com/Spico197/Mirror .

computational linguistic, dataset, proceedings, (12 more...)

arXiv.org Artificial Intelligence

2311.05419

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(18 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Education (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)

Add feedback

nlpBDpatriots at BLP-2023 Task 2: A Transfer Learning Approach to Bangla Sentiment Analysis

Goswami, Dhiman, Raihan, Md Nishat, Puspo, Sadiya Sayara Chowdhury, Zampieri, Marcos

arXiv.org Artificial IntelligenceNov-25-2023

In this paper, we discuss the nlpBDpatriots entry to the shared task on Sentiment Analysis of Bangla Social Media Posts organized at the first workshop on Bangla Language Processing (BLP) co-located with EMNLP. The main objective of this task is to identify the polarity of social media content using a Bangla dataset annotated with positive, neutral, and negative labels provided by the shared task organizers. Our best system for this task is a transfer learning approach with data augmentation which achieved a micro F1 score of 0.71. Our best system ranked 12th among 30 teams that participated in the competition.

dataset, sentiment analysis, transformer-based model, (13 more...)

arXiv.org Artificial Intelligence

2311.15032

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.67)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.77)
(2 more...)

Add feedback

Enhancing Sentiment Analysis Results through Outlier Detection Optimization

Chen, Yuetian, Si, Mei

arXiv.org Artificial IntelligenceNov-25-2023

When dealing with text data containing subjective labels like speaker emotions, inaccuracies or discrepancies among labelers are not uncommon. Such discrepancies can significantly affect the performance of machine learning algorithms. This study investigates the potential of identifying and addressing outliers in text data with subjective labels, aiming to enhance classification outcomes. We utilized the Deep SVDD algorithm, a one-class classification method, to detect outliers in nine text-based emotion and sentiment analysis datasets. By employing both a small-sized language model (DistilBERT base model with 66 million parameters) and non-deep learning machine learning algorithms (decision tree, KNN, Logistic Regression, and LDA) as the classifier, our findings suggest that the removal of outliers can lead to enhanced results in most cases. Additionally, as outliers in such datasets are not necessarily unlearnable, we experienced utilizing a large language model -- DeBERTa v3 large with 131 million parameters, which can capture very complex patterns in data. We continued to observe performance enhancements across multiple datasets.

dataset, outlier, threshold, (16 more...)

arXiv.org Artificial Intelligence

2311.16185

Country:

Europe > United Kingdom (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(2 more...)

Genre: Research Report > New Finding (0.89)

Industry: Health & Medicine > Health Care Technology > Telehealth (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

tieval: An Evaluation Framework for Temporal Information Extraction Systems

Sousa, Hugo, Jorge, Alípio, Campos, Ricardo

arXiv.org Artificial IntelligenceNov-24-2023

Temporal information extraction (TIE) has attracted a great deal of interest over the last two decades, leading to the development of a significant number of datasets. Despite its benefits, having access to a large volume of corpora makes it difficult when it comes to benchmark TIE systems. On the one hand, different datasets have different annotation schemes, thus hindering the comparison between competitors across different corpora. On the other hand, the fact that each corpus is commonly disseminated in a different format requires a considerable engineering effort for a researcher/practitioner to develop parsers for all of them. This constraint forces researchers to select a limited amount of datasets to evaluate their systems which consequently limits the comparability of the systems. Yet another obstacle that hinders the comparability of the TIE systems is the evaluation metric employed. While most research works adopt traditional metrics such as precision, recall, and $F_1$, a few others prefer temporal awareness -- a metric tailored to be more comprehensive on the evaluation of temporal systems. Although the reason for the absence of temporal awareness in the evaluation of most systems is not clear, one of the factors that certainly weights this decision is the necessity to implement the temporal closure algorithm in order to compute temporal awareness, which is not straightforward to implement neither is currently easily available. All in all, these problems have limited the fair comparison between approaches and consequently, the development of temporal extraction systems. To mitigate these problems, we have developed tieval, a Python library that provides a concise interface for importing different corpora and facilitates system evaluation. In this paper, we present the first public release of tieval and highlight its most relevant features.

computational linguistic, proceedings, relation, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3539618.3591892

2301.04643

Country:

Asia > Taiwan > Taiwan Province > Taipei (0.05)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Portugal > Porto > Porto (0.04)
(18 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)

Add feedback

Meta is giving researchers more access to Facebook and Instagram data

MIT Technology ReviewNov-21-2023, 11:00:00 GMT

In an interview, Meta's president of global affairs, Nick Clegg, said the tools "are really quite important" in that they provide, in a lot of ways, "the most comprehensive access to publicly available content across Facebook and Instagram of anything that we've built to date." The Content Library will also help the company meet new regulatory requirements and obligations on data sharing and transparency, as the company notes in a blog post Tuesday. The library and associated API were first released as a beta version several months ago and allow researchers to access near-real-time data about pages, posts, groups, and events on Facebook and creator and business accounts on Instagram, as well as the associated numbers of reactions, shares, comments, and post view counts. While all this data is publicly available--as in, anyone can see public posts, reactions, and comments on Facebook--the new library makes it easier for researchers to search and analyze this content at scale. Meta says that to protect user privacy, this data will be accessible only through a virtual "clean room" and not downloadable.

facebook and instagram data, meta, social media company, (7 more...)

MIT Technology Review

Country: North America > United States > New York (0.06)

Industry:

Information Technology > Services (0.70)
Government (0.58)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.40)

Add feedback

LowResource at BLP-2023 Task 2: Leveraging BanglaBert for Low Resource Sentiment Analysis of Bangla Language

Chakma, Aunabil, Hasan, Masum

arXiv.org Artificial IntelligenceNov-21-2023

This paper describes the system of the LowResource Team for Task 2 of BLP-2023, which involves conducting sentiment analysis on a dataset composed of public posts and comments from diverse social media platforms. Our primary aim is to utilize BanglaBert, a BERT model pre-trained on a large Bangla corpus, using various strategies including fine-tuning, dropping random tokens, and using several external datasets. Our final model is an ensemble of the three best BanglaBert variations. Our system has achieved overall 3rd in the Test Set among 30 participating teams with a score of 0.718. Additionally, we discuss the promising systems that didn't perform well namely task-adaptive pertaining and paraphrasing using BanglaT5. Training codes and external datasets which are used for our system are publicly available at https://github.com/Aunabil4602/bnlp-workshop-task2-2023

banglabert, computational linguistic, dataset, (14 more...)

arXiv.org Artificial Intelligence

2311.12735

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Bangladesh (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.74)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.74)

Add feedback

Sentiment Analysis of Twitter Posts on Global Conflicts

Sasikumar, Ujwal, Zaman, Ank, Mawlood-Yunis, Abdul-Rahman, Chatterjee, Prosenjit

arXiv.org Artificial IntelligenceNov-20-2023

Sentiment analysis of social media data is an emerging field with vast applications in various domains. In this study, we developed a sentiment analysis model to analyze social media sentiment, especially tweets, during global conflicting scenarios. To establish our research experiment, we identified a recent global dispute incident on Twitter and collected around 31,000 filtered Tweets for several months to analyze human sentiment worldwide.

dataset, sentiment, tweet, (11 more...)

arXiv.org Artificial Intelligence

2312.03715

Country:

North America > Canada > Ontario > Waterloo Region > Waterloo (0.15)
Europe > Ukraine (0.05)
North America > United States > Utah > Iron County > Cedar City (0.04)
(4 more...)

Genre: Research Report > New Finding (0.49)

Industry: Information Technology > Services (0.83)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.99)
(3 more...)

Add feedback

Optimal Strategies to Perform Multilingual Analysis of Social Content for a Novel Dataset in the Tourism Domain

Masson, Maxime, Agerri, Rodrigo, Sallaberry, Christian, Bessagnet, Marie-Noelle, Lacayrelle, Annig Le Parc, Roose, Philippe

arXiv.org Artificial IntelligenceNov-20-2023

The rising influence of social media platforms in various domains, including tourism, has highlighted the growing need for efficient and automated natural language processing (NLP) approaches to take advantage of this valuable resource. However, the transformation of multilingual, unstructured, and informal texts into structured knowledge often poses significant challenges. In this work, we evaluate and compare few-shot, pattern-exploiting and fine-tuning machine learning techniques on large multilingual language models (LLMs) to establish the best strategy to address the lack of annotated data for 3 common NLP tasks in the tourism domain: (1) Sentiment Analysis, (2) Named Entity Recognition, and (3) Fine-grained Thematic Concept Extraction (linked to a semantic resource). Furthermore, we aim to ascertain the quantity of annotated examples required to achieve good performance in those 3 tasks, addressing a common challenge encountered by NLP researchers in the construction of domain-specific datasets. Extensive experimentation on a newly collected and annotated multilingual (French, English, and Spanish) dataset composed of tourism-related tweets shows that current few-shot learning techniques allow us to obtain competitive results for all three tasks with very little annotation data: 5 tweets per label (15 in total) for Sentiment Analysis, 10% of the tweets for location detection (around 160) and 13% (200 approx.) of the tweets annotated with thematic concepts, a highly fine-grained sequence labeling task based on an inventory of 315 classes. This comparative analysis, grounded in a novel dataset, paves the way for applying NLP to new domain-specific applications, reducing the need for manual annotations and circumventing the complexities of rule-based, ad hoc solutions.

dataset, language model, tweet, (9 more...)

arXiv.org Artificial Intelligence

2311.14727

Country:

Europe > Spain > Basque Country (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
(9 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Media (1.00)
Consumer Products & Services > Travel (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Comprehensive Review on Sentiment Analysis: Tasks, Approaches and Applications

Kumar, Sudhanshu, Roy, Partha Pratim, Dogra, Debi Prosad, Kim, Byung-Gyu

arXiv.org Artificial IntelligenceNov-19-2023

Sentiment analysis (SA) is an emerging field in text mining. It is the process of computationally identifying and categorizing opinions expressed in a piece of text over different social media platforms. Social media plays an essential role in knowing the customer mindset towards a product, services, and the latest market trends. Most organizations depend on the customer's response and feedback to upgrade their offered products and services. SA or opinion mining seems to be a promising research area for various domains. It plays a vital role in analyzing big data generated daily in structured and unstructured formats over the internet. This survey paper defines sentiment and its recent research and development in different domains, including voice, images, videos, and text. The challenges and opportunities of sentiment analysis are also discussed in the paper. \keywords{Sentiment Analysis, Machine Learning, Lexicon-based approach, Deep Learning, Natural Language Processing}

international conference, proceedings, sentiment analysis, (13 more...)

arXiv.org Artificial Intelligence

2311.1125

Country:

North America > United States > Missouri > Jackson County > Kansas City (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
(3 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.68)

Industry:

Media (1.00)
Information Technology > Services (1.00)
Banking & Finance > Economy (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SBTRec- A Transformer Framework for Personalized Tour Recommendation Problem with Sentiment Analysis

Ho, Ngai Lam, Lee, Roy Ka-Wei, Lim, Kwan Hui

arXiv.org Artificial IntelligenceNov-18-2023

When traveling to an unfamiliar city for holidays, tourists often rely on guidebooks, travel websites, or recommendation systems to plan their daily itineraries and explore popular points of interest (POIs). However, these approaches may lack optimization in terms of time feasibility, localities, and user preferences. In this paper, we propose the SBTRec algorithm: a BERT-based Trajectory Recommendation with sentiment analysis, for recommending personalized sequences of POIs as itineraries. The key contributions of this work include analyzing users' check-ins and uploaded photos to understand the relationship between POI visits and distance. We introduce SBTRec, which encompasses sentiment analysis to improve recommendation accuracy by understanding users' preferences and satisfaction levels from reviews and comments about different POIs. Our proposed algorithms are evaluated against other sequence prediction methods using datasets from 8 cities. The results demonstrate that SBTRec achieves an average F1 score of 61.45%, outperforming baseline algorithms. The paper further discusses the flexibility of the SBTRec algorithm, its ability to adapt to different scenarios and cities without modification, and its potential for extension by incorporating additional information for more reliable predictions. Overall, SBTRec provides personalized and relevant POI recommendations, enhancing tourists' overall trip experiences. Future work includes fine-tuning personalized embeddings for users, with evaluation of users' comments on POIs,~to further enhance prediction accuracy.

algorithm, itinerary, trajectory, (15 more...)

arXiv.org Artificial Intelligence

2311.11071

Country:

Asia > Singapore (0.04)
Europe > Switzerland (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(7 more...)

Genre: Research Report > New Finding (0.87)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.83)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.83)

Add feedback