AITopics | Information Extraction

Collaborating Authors

Information Extraction

News Overviews Instructional Materials AI-Alerts Classics

How do datasets, developers, and models affect biases in a low-resourced language?

arXiv.org Artificial IntelligenceJun-10-2025

Sociotechnical systems, such as language technologies, frequently exhibit identity-based biases. These biases exacerbate the experiences of historically marginalized communities and remain understudied in low-resource contexts. While models and datasets specific to a language or with multilingual support are commonly recommended to address these biases, this paper empirically tests the effectiveness of such approaches in the context of gender, religion, and nationality-based identities in Bengali, a widely spoken but low-resourced language. We conducted an algorithmic audit of sentiment analysis models built on mBERT and BanglaBERT, which were fine-tuned using all Bengali sentiment analysis (BSA) datasets from Google Dataset Search. Our analyses showed that BSA models exhibit biases across different identity categories despite having similar semantic content and structure. We also examined the inconsistencies and uncertainties arising from combining pre-trained models and datasets created by individuals from diverse demographic backgrounds. We connected these findings to the broader discussions on epistemic injustice, AI alignment, and methodological decisions in algorithmic audits.

artificial intelligence, natural language, social media, (19 more...)

arXiv.org Artificial Intelligence

2506.06816

Country:

Europe (1.00)
Asia (0.94)
North America > United States > Colorado (0.28)
North America > Canada > Ontario (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (1.00)
Health & Medicine (0.93)
Law > Civil Rights & Constitutional Law (0.93)
(2 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.67)
(2 more...)

Add feedback

Domain Lexical Knowledge-based Word Embedding Learning for Text Classification under Small Data

Zhu, Zixiao, Mao, Kezhi

arXiv.org Artificial IntelligenceJun-3-2025

Pre-trained language models such as BERT have been proved to be powerful in many natural language processing tasks. But in some text classification applications such as emotion recognition and sentiment analysis, BERT may not lead to satisfactory performance. This often happens in applications where keywords play critical roles in the prediction of class labels. Our investigation found that the root cause of the problem is that the context-based BERT embedding of the keywords may not be discriminative enough to produce discriminative text representation for classification. Motivated by this finding, we develop a method to enhance word embeddings using domain-specific lexical knowledge. The knowledge-based embedding enhancement model projects the BERT embedding into a new space where within-class similarity and between-class difference are maximized. To implement the knowledge-based word embedding enhancement model, we also develop a knowledge acquisition algorithm for automatically collecting lexical knowledge from online open sources. Experiment results on three classification tasks, including sentiment analysis, emotion recognition and question answering, have shown the effectiveness of our proposed word embedding enhancing model. The codes and datasets are in https://github.com/MidiyaZhu/KVWEFFER.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.01621

Country: Asia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Confidence-Aware Self-Distillation for Multimodal Sentiment Analysis with Incomplete Modalities

Luo, Yanxi, Wang, Shijin, Xu, Zhongxing, Li, Yulong, Tang, Feilong, Su, Jionglong

arXiv.org Artificial IntelligenceJun-3-2025

Multimodal sentiment analysis (MSA) aims to understand human sentiment through multimodal data. In real-world scenarios, practical factors often lead to uncertain modality missingness. Existing methods for handling modality missingness are based on data reconstruction or common subspace projections. However, these methods neglect the confidence in multimodal combinations and impose constraints on intra-class representation, hindering the capture of modality-specific information and resulting in suboptimal performance. To address these challenges, we propose a Confidence-Aware Self-Distillation (CASD) strategy that effectively incorporates multimodal probabilistic embeddings via a mixture of Student's $t$-distributions, enhancing its robustness by incorporating confidence and accommodating heavy-tailed properties. This strategy estimates joint distributions with uncertainty scores and reduces uncertainty in the student network by consistency distillation. Furthermore, we introduce a reparameterization representation module that facilitates CASD in robust multimodal learning by sampling embeddings from the joint distribution for the prediction module to calculate the task loss. As a result, the directional constraint from the loss minimization is alleviated by the sampled representation. Experimental results on three benchmark datasets demonstrate that our method achieves state-of-the-art performance.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.0149

Genre: Research Report (0.82)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.72)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Visualizing Public Opinion on X: A Real-Time Sentiment Dashboard Using VADER and DistilBERT

Reddy, Yanampally Abhiram, Agarwal, Siddhi, Parashar, Vikram, Arora, Arshiya

arXiv.org Artificial IntelligenceJun-3-2025

In the age of social media, understanding public sentiment toward major corporations is crucial for investors, policymakers, and researchers. This paper presents a comprehensive sentiment analysis system tailored for corporate reputation monitoring, combining Natural Language Processing (NLP) and machine learning techniques to accurately interpret public opinion in real time. The methodology integrates a hybrid sentiment detection framework leveraging both rule-based models (VADER) and transformer-based deep learning models (DistilBERT), applied to social media data from multiple platforms. The system begins with robust preprocessing involving noise removal and text normalization, followed by sentiment classification using an ensemble approach to ensure both interpretability and contextual accuracy. Results are visualized through sentiment distribution plots, comparative analyses, and temporal sentiment trends for enhanced interpretability. Our analysis reveals significant disparities in public sentiment across major corporations, with companies like Amazon (81.2) and Samsung (45.8) receiving excellent sentiment scores, while Microsoft (21.7) and Walmart (21.9) exhibit poor sentiment profiles. These findings demonstrate the utility of our multi-source sentiment framework in providing actionable insights regarding corporate public perception, enabling stakeholders to make informed strategic decisions based on comprehensive sentiment analysis.

machine learning, natural language, sentiment analysis, (18 more...)

arXiv.org Artificial Intelligence

2504.15448

Country:

North America > United States (0.28)
Europe (0.28)
Asia (0.28)

Genre: Research Report > New Finding (0.88)

Industry:

Government (1.00)
Information Technology > Services (0.68)
Banking & Finance > Trading (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How to access and download your Facebook data

FOX NewsMay-30-2025, 10:00:04 GMT

Founder and Hedgehog CEO John Matze joined'FOX & Friends First' to discuss his optimism surrounding the community notes program, staying competitive globally with AI and the possibility of Oracle buying TikTok. Reviewing your Facebook data allows you to see what personal information Facebook has collected about you, helping you make informed decisions about your privacy settings. You might also need a copy of your data, which serves as a backup of your photos, messages and memories in case you lose access to your account or decide to delete it. Additionally, understanding what data Facebook stores can help you better comprehend how the platform uses your information for advertising and content personalization. Here's how to do it.

cyberguy, download, facebook data, (13 more...)

FOX News

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.87)

Add feedback

FCKT: Fine-Grained Cross-Task Knowledge Transfer with Semantic Contrastive Learning for Targeted Sentiment Analysis

Chen, Wei, Zhang, Zhao, Yuan, Meng, Xu, Kepeng, Zhuang, Fuzhen

arXiv.org Artificial IntelligenceMay-29-2025

In this paper, we address the task of targeted sentiment analysis (TSA), which involves two sub-tasks, i.e., identifying specific aspects from reviews and determining their corresponding sentiments. Aspect extraction forms the foundation for sentiment prediction, highlighting the critical dependency between these two tasks for effective cross-task knowledge transfer. While most existing studies adopt a multi-task learning paradigm to align task-specific features in the latent space, they predominantly rely on coarse-grained knowledge transfer. Such approaches lack fine-grained control over aspect-sentiment relationships, often assuming uniform sentiment polarity within related aspects. This oversimplification neglects contextual cues that differentiate sentiments, leading to negative transfer. To overcome these limitations, we propose FCKT, a fine-grained cross-task knowledge transfer framework tailored for TSA. By explicitly incorporating aspect-level information into sentiment prediction, FCKT achieves fine-grained knowledge transfer, effectively mitigating negative transfer and enhancing task performance. Experiments on three datasets, including comparisons with various baselines and large language models (LLMs), demonstrate the effectiveness of FCKT. The source code is available on https://github.com/cwei01/FCKT.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.2104

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline

Neural Information Processing SystemsMay-27-2025, 14:36:53 GMT

Existing video multi-modal sentiment analysis mainly focuses on the sentiment expression of people within the video, yet often neglects the induced sentiment of viewers while watching the videos. Induced sentiment of viewers is essential for inferring the public response to videos and has broad application in analyzing public societal sentiment, effectiveness of advertising and other areas. The micro videos and the related comments provide a rich application scenario for viewers' induced sentiment analysis. In light of this, we introduces a novel research task, Multimodal Sentiment Analysis for Comment Response of Video Induced(MSA-CRVI), aims to infer opinions and emotions according to comments response to micro video. Meanwhile, we manually annotate a dataset named Comment Sentiment toward to Micro Video (CSMV) to support this research.

artificial intelligence, comment response, natural language, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Towards Robust Multimodal Sentiment Analysis with Incomplete Data

Neural Information Processing SystemsMay-27-2025, 03:43:29 GMT

The field of Multimodal Sentiment Analysis (MSA) has recently witnessed an emerging direction seeking to tackle the issue of data incompleteness. Recognizing that the language modality typically contains dense sentiment information, we consider it as the dominant modality and present an innovative Language-dominated Noise-resistant Learning Network (LNLN) to achieve robust MSA. The proposed LNLN features a dominant modality correction (DMC) module and dominant modality based multimodal learning (DMML) module, which enhances the model's robustness across various noise scenarios by ensuring the quality of dominant modality representations. Aside from the methodical design, we perform comprehensive experiments under random data missing scenarios, utilizing diverse and meaningful settings on several popular datasets (e.g., MOSI, MOSEI, and SIMS), providing additional uniformity, transparency, and fairness compared to existing evaluations in the literature. Empirically, LNLN consistently outperforms existing baselines, demonstrating superior performance across these challenging and extensive evaluation metrics.

artificial intelligence, multimodal sentiment analysis, natural language, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.66)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.66)

Add feedback

Toward Robust Incomplete Multimodal Sentiment Analysis via Hierarchical Representation Learning

Neural Information Processing SystemsMay-26-2025, 20:43:29 GMT

Multimodal Sentiment Analysis (MSA) is an important research area that aims to understand and recognize human sentiment through multiple modalities. The complementary information provided by multimodal fusion promotes better sentiment analysis compared to utilizing only a single modality. Nevertheless, in real-world applications, many unavoidable factors may lead to situations of uncertain modality missing, thus hindering the effectiveness of multimodal modeling and degrading the model's performance. To this end, we propose a Hierarchical Representation Learning Framework (HRLF) for the MSA task under uncertain missing modalities. Specifically, we propose a fine-grained representation factorization module that sufficiently extracts valuable sentiment information by factorizing modality into sentiment-relevant and modality-specific representations through crossmodal translation and sentiment semantic reconstruction.

artificial intelligence, multimodal sentiment analysis, natural language, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.89)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.89)

Add feedback

MPL: Multiple Programming Languages with Large Language Models for Information Extraction

Li, Bo, Fang, Gexiang, Ye, Wei, Xu, Zhenghua, Zhang, Jinglei, Cheng, Hao, Zhang, Shikun

arXiv.org Artificial IntelligenceMay-23-2025

Recent research in information extraction (IE) focuses on utilizing code-style inputs to enhance structured output generation. The intuition behind this is that the programming languages (PLs) inherently exhibit greater structural organization than natural languages (NLs). This structural advantage makes PLs particularly suited for IE tasks. Nevertheless, existing research primarily focuses on Python for code-style simulation, overlooking the potential of other widely-used PLs (e.g., C++ and Java) during the supervised fine-tuning (SFT) phase. In this research, we propose \textbf{M}ultiple \textbf{P}rogramming \textbf{L}anguages with large language models for information extraction (abbreviated as \textbf{MPL}), a novel framework that explores the potential of incorporating different PLs in the SFT phase. Additionally, we introduce \texttt{function-prompt} with virtual running to simulate code-style inputs more effectively and efficiently. Experimental results on a wide range of datasets demonstrate the effectiveness of MPL. Furthermore, we conduct extensive experiments to provide a comprehensive analysis. We have released our code for future research.

arxiv preprint arxiv, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.16107

Country: Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)

Add feedback