AITopics | Information Extraction

Collaborating Authors

Information Extraction

News Overviews Instructional Materials AI-Alerts Classics

Meursault as a Data Point

arXiv.org Artificial IntelligenceNov-27-2025

Abstract--In an era dominated by datafication, the reduction of human experiences to quantifiable metrics raises profound philosophical and ethical questions. This paper explores these issues through the lens of Meursault, the protagonist of Albert Camus' The Stranger, whose emotionally detached existence epitomizes the existential concept of absurdity. Using natural language processing (NLP) techniques including emotion detection (BERT), sentiment analysis (V ADER), and named entity recognition (spaCy)-this study quantifies key events and behaviors in Meursault's life. Our analysis reveals the inherent limitations of applying algorithmic models to complex human experiences, particularly those rooted in existential alienation and moral ambiguity. By examining how modern AI tools misinterpret Meursault's actions and emotions, this research underscores the broader ethical dilemmas of reducing nuanced human narratives to data points, challenging the foundational assumptions of our data-driven society. The findings presented in this paper serve as a critique of the increasing reliance on data-driven narratives and advocate for incorporating humanistic values in artificial intelligence. In the digital age, the quantification of human experience has become a dominant paradigm, promising objectivity and predictive power [5]. However, this reductionist approach, known as datafication, risks obscuring the complexity and nuance inherent in human existence [6].

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.01364

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > India (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.70)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.69)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Language-Independent Sentiment Labelling with Distant Supervision: A Case Study for English, Sepedi and Setswana

Mabokela, Koena Ronny, Schlippe, Tim, Raborife, Mpho, Celik, Turgay

arXiv.org Artificial IntelligenceNov-26-2025

Sentiment analysis is a helpful task to automatically analyse opinions and emotions on various topics in areas such as AI for Social Good, AI in Education or marketing. While many of the sentiment analysis systems are developed for English, many African languages are classified as low-resource languages due to the lack of digital language resources like text labelled with corresponding sentiment classes. One reason for that is that manually labelling text data is time-consuming and expensive. Consequently, automatic and rapid processes are needed to reduce the manual effort as much as possible making the labelling process as efficient as possible. In this paper, we present and analyze an automatic language-independent sentiment labelling method that leverages information from sentiment-bearing emojis and words. Our experiments are conducted with tweets in the languages English, Sepedi and Setswana from SAfriSenti, a multilingual sentiment corpus for South African languages. We show that our sentiment labelling approach is able to label the English tweets with an accuracy of 66%, the Sepedi tweets with 69%, and the Setswana tweets with 63%, so that on average only 34% of the automatically generated labels remain to be corrected.

artificial intelligence, natural language, tweet, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.14746/amup.9788323241775

2511.19818

Country:

North America > United States (0.04)
Europe > Germany (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Africa > South Africa > Gauteng > Johannesburg (0.04)

Genre: Research Report (0.40)

Industry: Social Sector (0.70)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.76)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.76)

Add feedback

Breaking Bad: Norms for Valence, Arousal, and Dominance for over 10k English Multiword Expressions

Mohammad, Saif M.

arXiv.org Artificial IntelligenceNov-26-2025

Factor analysis studies have shown that the primary dimensions of word meaning are Valence (V), Arousal (A), and Dominance (D). Existing lexicons such as the NRC VAD Lexicon, published in 2018, include VAD association ratings for words. Here, we present a complement to it, which has human ratings of valence, arousal, and dominance for 10k English Multiword Expressions (MWEs) and their constituent words. We also increase the coverage of unigrams, especially words that have become more common since 2018. In all, the new NRC VAD Lexicon v2 now has entries for 10k MWEs and 25k words, in addition to the entries in v1. We show that the associations are highly reliable. We use the lexicon to examine emotional characteristics of MWEs, including: 1. The degree to which MWEs (idioms, noun compounds, and verb particle constructions) exhibit strong emotionality; 2. The degree of emotional compositionality in MWEs. The lexicon enables a wide variety of research in NLP, Psychology, Public Health, Digital Humanities, and Social Sciences. The NRC VAD Lexicon v2 is freely available through the project webpage: http://saifmohammad.com/WebPages/nrc-vad.html

artificial intelligence, lexicon, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.19816

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.14)
North America > Canada (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(9 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.68)

Add feedback

A Unified BERT-CNN-BiLSTM Framework for Simultaneous Headline Classification and Sentiment Analysis of Bangla News

Raquib, Mirza, Akash, Munazer Montasir, Ahmed, Tawhid, Murad, Saydul Akbar, Prity, Farida Siddiqi, Hossain, Mohammad Amzad, Polok, Asif Pervez, Rahimi, Nick

arXiv.org Artificial IntelligenceNov-25-2025

Abstract--In our daily lives, newspapers are an essential information source that impacts how the public talks about present-day issues. However, effectively navigating the vast amount of news content from different newspapers and online news portals can be challenging. Newspaper headlines with sentiment analysis tell us what the news is about (e.g., politics, sports) and how the news makes us feel (positive, negative, neutral). This helps us quickly understand the emotional tone of the news. This research presents a state-of-the-art approach to Bangla news headline classification combined with sentiment analysis applying Natural Language Processing (NLP) techniques, particularly the hybrid transfer learning model BERT -CNN-BiLSTM. We have explored a dataset called BAN-ABSA of 9014 news headlines, which is the first time that has been experimented with simultaneously in the headline and sentiment categorization in Bengali newspapers. Over this imbalanced dataset, we applied two experimental strategies: technique-1, where undersampling and oversampling are applied before splitting, and technique-2, where undersam-pling and oversampling are applied after splitting on the In technique-1 oversampling provided the strongest performance, both headline and sentiment, that is 78.57% and 73.43% respectively, while technique-2 delivered the highest result when trained directly on the original imbalanced dataset, both headline and sentiment, that is 81.37% and 64.46% respectively. The proposed model BERT -CNN-BiLSTM significantly outperforms all baseline models in classification tasks, and achieves new state-of-the-art results for Bangla news headline classification and sentiment analysis. These results demonstrate the importance of leveraging both the headline and sentiment datasets, and provide a strong baseline for Bangla text classification in low-resource. The rapid growth of digital content and the internet has necessitated robust natural language processing (NLP) systems that can analyze and comprehend human language properly. For instance, a language like Bangla, which is one of the most spoken languages in the world, has remained mostly overlooked as compared to English and other well-resourced languages. Newspapers continue to be one of the most significant information sources and the headlines play a crucial role by providing a quick idea of news content. At such times, headlines often convey a mood that can impact how readers interpret and react to news.

classification, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.18618

Country:

North America > United States > Mississippi > Forrest County > Hattiesburg (0.14)
North America > United States > Oklahoma (0.04)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.90)

Add feedback

Emotion-Enhanced Multi-Task Learning with LLMs for Aspect Category Sentiment Analysis

Chai, Yaping, Xie, Haoran, Qin, Joe S.

arXiv.org Artificial IntelligenceNov-25-2025

Aspect category sentiment analysis (ACSA) has achieved remarkable progress with large language models (LLMs), yet existing approaches primarily emphasize sentiment polarity while overlooking the underlying emotional dimensions that shape sentiment expressions. This limitation hinders the model's ability to capture fine-grained affective signals toward specific aspect categories. To address this limitation, we introduce a novel emotion-enhanced multi-task ACSA framework that jointly learns sentiment polarity and category-specific emotions grounded in Ekman's six basic emotions. Leveraging the generative capabilities of LLMs, our approach enables the model to produce emotional descriptions for each aspect category, thereby enriching sentiment representations with affective expressions. Furthermore, to ensure the accuracy and consistency of the generated emotions, we introduce an emotion refinement mechanism based on the Valence-Arousal-Dominance (VAD) dimensional framework. Specifically, emotions predicted by the LLM are projected onto a VAD space, and those inconsistent with their corresponding VAD coordinates are re-annotated using a structured LLM-based refinement strategy. Experimental results demonstrate that our approach significantly outperforms strong baselines on all benchmark datasets. This underlines the effectiveness of integrating affective dimensions into ACSA.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.19122

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
Asia > China > Hong Kong (0.06)
Europe > Ukraine (0.04)
(10 more...)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

The Shifting Landscape of Vaccine Discourse: Insights From a Decade of Pre- to Post-COVID-19 Vaccine Posts on Social Media

Gyawali, Nikesh, Caragea, Doina, Caragea, Cornelia, Mohammad, Saif M.

arXiv.org Artificial IntelligenceNov-24-2025

In this work, we study English-language vaccine discourse in social media posts, specifically posts on X (formerly Twitter), in seven years before the COVID-19 outbreak (2013 to 2019) and three years after the outbreak was first reported (2020 to 2022). Drawing on theories from social cognition and the stereotype content model in Social Psychology, we analyze how English speakers talk about vaccines on social media to understand the evolving narrative around vaccines in social media posts. To do that, we first introduce a novel dataset comprising 18.7 million curated posts on vaccine discourse from 2013 to 2022. This extensive collection-filtered down from an initial 129 million posts through rigorous preprocessing-captures both pre-COVID and COVID-19 periods, offering valuable insights into the evolution of English-speaking X users' perceptions related to vaccines. Our analysis shows that the COVID-19 pandemic led to complex shifts in X users' sentiment and discourse around vaccines. We observe that negative emotion word usage decreased during the pandemic, with notable rises in usage of surprise, and trust related emotion words. Furthermore, vaccine-related language tended to use more warmth-focused words associated with trustworthiness, along with positive, competence-focused words during the early days of the pandemic, with a marked rise in negative word usage towards the end of the pandemic, possibly reflecting a growing vaccine hesitancy and skepticism.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.16832

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > Canada (0.04)
Europe > United Kingdom (0.04)
(7 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Vaccines (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

MAPROC at AHaSIS Shared Task: Few-Shot and Sentence Transformer for Sentiment Analysis of Arabic Hotel Reviews

Zarnoufi, Randa

arXiv.org Artificial IntelligenceNov-20-2025

Sentiment analysis of Arabic dialects presents significant challenges due to linguistic diversity and the scarcity of annotated data. This paper describes our approach to the AHaSIS shared task, which focuses on sentiment analysis on Arabic dialects in the hospitality domain. The dataset comprises hotel reviews written in Moroccan and Saudi dialects, and the objective is to classify the reviewers sentiment as positive, negative, or neutral. We employed the SetFit (Sentence Transformer Fine-tuning) framework, a data-efficient few-shot learning technique. On the official evaluation set, our system achieved an F1 of 73%, ranking 12th among 26 participants. This work highlights the potential of few-shot learning to address data scarcity in processing nuanced dialectal Arabic text within specialized domains like hotel reviews.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.15291

Genre: Research Report (0.84)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Continuous sentiment scores for literary and multilingual contexts

Lyngbaek, Laurits, Feldkamp, Pascale, Bizzoni, Yuri, Nielbo, Kristoffer, Enevoldsen, Kenneth

arXiv.org Artificial IntelligenceNov-19-2025

Sentiment Analysis is widely used to quantify sentiment in text, but its application to literary texts poses unique challenges due to figurative language, stylistic ambiguity, as well as sentiment evocation strategies. Traditional dictionary-based tools often underperform, especially for low-resource languages, and transformer models, while promising, typically output coarse categorical labels that limit fine-grained analysis. We introduce a novel continuous sentiment scoring method based on concept vector projection, trained on multilingual literary data, which more effectively captures nuanced sentiment expressions across genres, languages, and historical periods. Our approach outperforms existing tools on English and Danish texts, producing sentiment scores whose distribution closely matches human ratings, enabling more accurate analysis and sentiment arc modeling in literature.

artificial intelligence, computational linguistic, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.1462

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
(16 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.72)

Add feedback

Categorical Emotions or Appraisals - Which Emotion Model Explains Argument Convincingness Better?

Greschner, Lynn, Bauer, Meike, Weber, Sabine, Klinger, Roman

arXiv.org Artificial IntelligenceNov-19-2025

The convincingness of an argument does not only depend on its structure (logos), the person who makes the argument (ethos), but also on the emotion that it causes in the recipient (pathos). While the overall intensity and categorical values of emotions in arguments have received considerable attention in the research community, we argue that the emotion an argument evokes in a recipient is subjective. It depends on the recipient's goals, standards, prior knowledge, and stance. Appraisal theories lend themselves as a link between the subjective cognitive assessment of events and emotions. They have been used in event-centric emotion analysis, but their suitability for assessing argument convincingness remains unexplored. In this paper, we evaluate whether appraisal theories are suitable for emotion analysis in arguments by considering subjective cognitive evaluations of the importance and impact of an argument on its receiver. Based on the annotations in the recently published ContArgA corpus, we perform zero-shot prompting experiments to evaluate the importance of gold-annotated and predicted emotions and appraisals for the assessment of the subjective convincingness labels. We find that, while categorical emotion information does improve convincingness prediction, the improvement is more pronounced with appraisals. This work presents the first systematic comparison between emotion models for convincingness prediction, demonstrating the advantage of appraisals, providing insights for theoretical and practical applications in computational argumentation.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.07162

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
(15 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.68)

Add feedback

Scaling Open-Weight Large Language Models for Hydropower Regulatory Information Extraction: A Systematic Analysis

Yoon, Hong-Jun, Ashraf, Faisal, Ruggles, Thomas A., Singh, Debjani

arXiv.org Artificial IntelligenceNov-18-2025

Information extraction from regulatory documents using large language models presents critical trade-offs between performance and computational resources. We evaluated seven open-weight models (0.6B-70B parameters) on hydropower licensing documentation to provide empirical deployment guidance. Our analysis identified a pronounced 14B parameter threshold where validation methods transition from ineffective (F1 $<$ 0.15) to viable (F1 = 0.64). Consumer-deployable models achieve 64\% F1 through appropriate validation, while smaller models plateau at 51\%. Large-scale models approach 77\% F1 but require enterprise infrastructure. We identified systematic hallucination patterns where perfect recall indicates extraction failure rather than success in smaller models. Our findings establish the first comprehensive resource-performance mapping for open-weight information extraction in regulatory contexts, enabling evidence-based model selection. These results provide immediate value for hydropower compliance while contributing insights into parameter scaling effects that generalize across information extraction tasks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.11821

Country:

North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)
North America > United States > Montana > Roosevelt County (0.04)
North America > Canada > Alberta (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy > Renewable (1.00)
Energy > Power Industry (0.90)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback