AITopics | Information Extraction

Collaborating Authors

Information Extraction

News Overviews Instructional Materials AI-Alerts Classics

Information Extraction from Conversation Transcripts: Neuro-Symbolic vs. LLM

Kwak, Alice Saebom, Alexeeva, Maria, Hahn-Powell, Gus, Alcock, Keith, McLaughlin, Kevin, McCorkle, Doug, McNunn, Gabe, Surdeanu, Mihai

arXiv.org Artificial IntelligenceOct-15-2025

The current trend in information extraction (IE) is to rely extensively on large language models, effectively discarding decades of experience in building symbolic or statistical IE systems. This paper compares a neuro-symbolic (NS) and an LLM-based IE system in the agricultural domain, evaluating them on nine interviews across pork, dairy, and crop subdomains. The LLM-based system outperforms the NS one (F1 total: 69.4 vs. 52.7; core: 63.0 vs. 47.2), where total includes all extracted information and core focuses on essential details. However, each system has trade-offs: the NS approach offers faster runtime, greater control, and high accuracy in context-free tasks but lacks generalizability, struggles with contextual nuances, and requires significant resources to develop and maintain. The LLM-based system achieves higher performance, faster deployment, and easier maintenance but has slower runtime, limited control, model dependency and hallucination risks. Our findings highlight the "hidden cost" of deploying NLP systems in real-world applications, emphasizing the need to balance performance, efficiency, and control.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2510.12023

Country:

North America > United States (1.00)
Europe (0.93)
Asia > Middle East > UAE (0.28)

Genre:

Research Report > New Finding (0.48)
Personal > Interview (0.46)

Industry: Food & Agriculture > Agriculture (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

MS-Mix: Unveiling the Power of Mixup for Multimodal Sentiment Analysis

Zhu, Hongyu, Chen, Lin, El-Yacoubi, Mounim A., Shang, Mingsheng

arXiv.org Artificial IntelligenceOct-14-2025

Multimodal Sentiment Analysis (MSA) aims to identify and interpret human emotions by integrating information from heterogeneous data sources such as text, video, and audio. While deep learning models have advanced in network architecture design, they remain heavily limited by scarce multimodal annotated data. Although Mixup-based augmentation improves generalization in unimodal tasks, its direct application to MSA introduces critical challenges: random mixing often amplifies label ambiguity and semantic inconsistency due to the lack of emotion-aware mixing mechanisms. To overcome these issues, we propose MS-Mix, an adaptive, emotion-sensitive augmentation framework that automatically optimizes sample mixing in multimodal settings. The key components of MS-Mix include: (1) a Sentiment-Aware Sample Selection (SASS) strategy that effectively prevents semantic confusion caused by mixing samples with contradictory emotions. (2) a Sentiment Intensity Guided (SIG) module using multi-head self-attention to compute modality-specific mixing ratios dynamically based on their respective emotional intensities. (3) a Sentiment Alignment Loss (SAL) that aligns the prediction distributions across modalities, and incorporates the Kullback-Leibler-based loss as an additional regularization term to train the emotion intensity predictor and the backbone network jointly. Extensive experiments on three benchmark datasets with six state-of-the-art backbones confirm that MS-Mix consistently outperforms existing methods, establishing a new standard for robust multimodal sentiment augmentation. The source code is available at: https://github.com/HongyuZhu-s/MS-Mix.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.11579

Country: Asia > China (0.15)

Genre:

Research Report (1.00)
Overview (0.68)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

CrosGrpsABS: Cross-Attention over Syntactic and Semantic Graphs for Aspect-Based Sentiment Analysis in a Low-Resource Language

Hossain, Md. Mithun, Hossain, Md. Shakil, Chaki, Sudipto, Hossain, Md. Rajib

arXiv.org Artificial IntelligenceOct-14-2025

Aspect-Based Sentiment Analysis (ABSA) is a fundamental task in natural language processing, offering fine-grained insights into opinions expressed in text. While existing research has largely focused on resource-rich languages like English which leveraging large annotated datasets, pre-trained models, and language-specific tools. These resources are often unavailable for low-resource languages such as Bengali. The ABSA task in Bengali remains poorly explored and is further complicated by its unique linguistic characteristics and a lack of annotated data, pre-trained models, and optimized hyperparameters. To address these challenges, this research propose CrosGrpsABS, a novel hybrid framework that leverages bidirectional cross-attention between syntactic and semantic graphs to enhance aspect-level sentiment classification. The CrosGrpsABS combines transformerbased contextual embeddings with graph convolutional networks, built upon rule-based syntactic dependency parsing and semantic similarity computations. By employing bidirectional crossattention, the model effectively fuses local syntactic structure with global semantic context, resulting in improved sentiment classification performance across both low- and high-resource settings. We evaluate CrosGrpsABS on four low-resource Bengali ABSA datasets and the high-resource English SemEval 2014 Task 4 dataset. The CrosGrpsABS consistently outperforms existing approaches, achieving notable improvements, including a 0.93% F1-score increase for the Restaurant domain and a 1.06% gain for the Laptop domain in the SemEval 2014 Task 4 benchmark.

machine learning, natural language, sentiment analysis, (17 more...)

arXiv.org Artificial Intelligence

2505.19018

Country: Asia (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Dynamic Span Interaction and Graph-Aware Memory for Entity-Level Sentiment Classification

Hossain, Md. Mithun, Sanjara, null, Hossain, Md. Shakil, Chaki, Sudipto

arXiv.org Artificial IntelligenceOct-14-2025

Entity-level sentiment classification involves identifying the sentiment polarity linked to specific entities within text. This task poses several challenges: effectively modeling the subtle and complex interactions between entities and their surrounding sentiment expressions; capturing dependencies that may span across sentences; and ensuring consistent sentiment predictions for multiple mentions of the same entity through coreference resolution. Additionally, linguistic phenomena such as negation, ambiguity, and overlapping opinions further complicate the analysis. These complexities make entity-level sentiment classification a difficult problem, especially in real-world, noisy textual data. To address these issues, we propose SpanEIT, a novel framework integrating dynamic span interaction and graph-aware memory mechanisms for enhanced entity-sentiment relational modeling. SpanEIT builds span-based representations for entities and candidate sentiment phrases, employs bidirectional attention for fine-grained interactions, and uses a graph attention network to capture syntactic and co-occurrence relations. A coreference-aware memory module ensures entity-level consistency across documents. Experiments on FSAD, BARU, and IMDB datasets show SpanEIT outperforms state-of-the-art transformer and hybrid baselines in accuracy and F1 scores. Ablation and interpretability analyses validate the effectiveness of our approach, underscoring its potential for fine-grained sentiment analysis in applications like social media monitoring and customer feedback analysis.

machine learning, natural language, sentiment analysis, (16 more...)

arXiv.org Artificial Intelligence

2509.11604

Country: Asia (0.67)

Genre: Research Report (1.00)

Industry:

Information Technology (0.68)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

adb1d9fa8be4576d28703b396b82ba1b-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-10-2025, 13:05:32 GMT

dataset, information, university, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Hubei Province > Wuhan (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
(5 more...)

Genre: Research Report (0.93)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Banking & Finance > Trading (1.00)
Government (0.93)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(4 more...)

Add feedback

Towards Robust Multimodal Sentiment Analysis with Incomplete Data

Neural Information Processing SystemsOct-10-2025, 04:41:24 GMT

Recognizing that the language modality typically contains dense sentiment information, we consider it as the dominant modality and present an innovative Language-dominated Noise-resistant Learning Network (LNLN) to achieve robust MSA.

dataset, information, modality, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Hubei Province > Wuhan (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.41)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.41)

Add feedback

SenWave: A Fine-Grained Multi-Language Sentiment Analysis Dataset Sourced from COVID-19 Tweets

Yang, Qiang, Chen, Xiuying, Ma, Changsheng, Yin, Rui, Gao, Xin, Zhang, Xiangliang

arXiv.org Artificial IntelligenceOct-10-2025

The global impact of the COVID-19 pandemic has highlighted the need for a comprehensive understanding of public sentiment and reactions. Despite the availability of numerous public datasets on COVID-19, some reaching volumes of up to 100 billion data points, challenges persist regarding the availability of labeled data and the presence of coarse-grained or inappropriate sentiment labels. In this paper, we introduce SenWave, a novel fine-grained multi-language sentiment analysis dataset specifically designed for analyzing COVID-19 tweets, featuring ten sentiment categories across five languages. The dataset comprises 10,000 annotated tweets each in English and Arabic, along with 30,000 translated tweets in Spanish, French, and Italian, derived from English tweets. Additionally, it includes over 105 million unlabeled tweets collected during various COVID-19 waves. To enable accurate fine-grained sentiment classification, we fine-tuned pre-trained transformer-based language models using the labeled tweets. Our study provides an in-depth analysis of the evolving emotional landscape across languages, countries, and topics, revealing significant insights over time. Furthermore, we assess the compatibility of our dataset with ChatGPT, demonstrating its robustness and versatility in various applications. Our dataset and accompanying code are publicly accessible on the repository\footnote{https://github.com/gitdevqiang/SenWave}. We anticipate that this work will foster further exploration into fine-grained sentiment analysis for complex events within the NLP community, promoting more nuanced understanding and research innovations.

large language model, machine learning, sentiment, (17 more...)

arXiv.org Artificial Intelligence

2510.08214

Country:

North America > United States (1.00)
Asia > Middle East > Saudi Arabia (0.29)

Genre: Research Report > Experimental Study (0.34)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

From Keywords to Clusters: AI-Driven Analysis of YouTube Comments to Reveal Election Issue Salience in 2024

Simoes, Raisa M., Kelly, Timoteo, Simoes, Eduardo J., Rao, Praveen

arXiv.org Artificial IntelligenceOct-10-2025

Abstract: This paper aims to explore two compet ing data science meth odologies to attempt answer ing th e question, " Which issues contributed most to voters' choice in the 2024 presidential election? " The methodologies involve novel empirical evidence driven by artificial intelligence (AI) techniques . By using two distinct methods based on natural language processing and clustering analysis to mine over eight thousand user comments on election - related YouTube videos from one right leaning journal, Wall Street Journal, and one left leaning journal, New York Times, during pre - election week, we quantify the frequency of selected issue areas among user comments to infer which issues were most salient to potential voters in the seven days preceding the November 5th election. Empirically, we primarily demonstrate that immigration and democracy were the most frequently and consistently invoked issues in user comments on the analyzed YouTube videos, followed by the issue of identity politics, while inflation was significantly less frequently referenced. These results corroborate certain findings of post - election surveys but also refute the supposed importance of inflation as an election issue. This indicate s that variations on opinion mining, with their analysis of raw user data online, ca n be more revealing than polling and surveys for analyzing election outcomes. Keywords: artificial intelligence; opinion mining; clustering; vot e choice; cleavages 1. Introduction The Democrats lost both houses of Congress and the Presidency to Republicans in the 2024 election, with former president Donald Trump winning all seven swing states and the national popular vote, despite most pre - election polls giving Vice President Kamala Harris and President Trump a roughly equal chance of winning . Most post - election punditry and analysis in the legacy press and alternative media has attributed the Democrats' large loss to two main issues - inflation [59] and immigration [30] However, a growing contingent of analysts has also attributed the election outcome to the Democratic party's association with cultural issues purportedly distant from the median voter's preferences, such as th ose alternatively aggregated under the concept of "identity" or " woke " politics [54, 56] . To this point, three post - election studies illustrate how voters associated Democrats with left - of - center ideas that were ostensibly distant from most voters' priorities. S urvey research from the think tank Third Way demonstrates that Democrats, and thus Kamala Harris, were largely perceived as "too liberal" [15], while a study from More In Common polling over 5, 000 Americans concluded that while inflation was the top concern for every major demographic group across both parties, Americans misperceived LGBT/transgender policies as the top policy priority for Democrats [37] .

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.07821

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.68)

Industry:

Government > Voting & Elections (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.71)
(2 more...)

Add feedback

d921c3c762b1522c475ac8fc0811bb0f-AuthorFeedback.pdf

Neural Information Processing SystemsOct-9-2025, 15:15:24 GMT

We wish to thank all of the reviewers for their time and thorough reading of our paper! We appreciate the reviewer's suggestions regarding clarity. We have added the suggested summary sentence "the key We started with binary sentiment classification, but are actively working on more tasks. RNN hidden states onto the top two PCs for two different input sequences that differ only by two tokens (replacing ' The trajectories start out the same as the initial tokens are identical. We have added a footnote noting this in the main text.

linear approximation, reviewer, rnn, (12 more...)

Neural Information Processing Systems

Technology: