AITopics | semeval

Collaborating Authors

semeval

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Humans Hallucinate Too: Language Models Identify and Correct Subjective Annotation Errors With Label-in-a-Haystack Prompts

Chochlakis, Georgios, Wu, Peter, Bedi, Arjun, Ma, Marcus, Lerman, Kristina, Narayanan, Shrikanth

arXiv.org Artificial IntelligenceNov-12-2025

Modeling complex subjective tasks in Natural Language Processing, such as recognizing emotion and morality, is considerably challenging due to significant variation in human annotations. This variation often reflects reasonable differences in semantic interpretations rather than mere noise, necessitating methods to distinguish between legitimate subjectivity and error. We address this challenge by exploring label verification in these contexts using Large Language Models (LLMs). First, we propose a simple In-Context Learning binary filtering baseline that estimates the reasonableness of a document-label pair. We then introduce the Label-in-a-Haystack setting: the query and its label(s) are included in the demonstrations shown to LLMs, which are prompted to predict the label(s) again, while receiving task-specific instructions (e.g., emotion recognition) rather than label copying. We show how the failure to copy the label(s) to the output of the LLM are task-relevant and informative. Building on this, we propose the Label-in-a-Haystack Rectification (LiaHR) framework for subjective label correction: when the model outputs diverge from the reference gold labels, we assign the generated labels to the example instead of discarding it. This approach can be integrated into annotation pipelines to enhance signal-to-noise ratios. Comprehensive analyses, human evaluations, and ecological validity studies verify the utility of LiaHR for label correction. Code is available at https://github.com/gchochla/liahr.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2025.emnlp-main.993

2505.17222

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Hybrid Extractive Abstractive Summarization for Multilingual Sentiment Analysis

Krasitskii, Mikhail, Sidorov, Grigori, Kolesnikova, Olga, Hernandez, Liliana Chanona, Gelbukh, Alexander

arXiv.org Artificial IntelligenceJun-10-2025

We propose a hybrid approach for multilingual sentiment analysis that combines extractive and abstractive summarization to address the limitations of standalone methods. The model integrates TF-IDF-based extraction with a fine-tuned XLM-R abstractive module, enhanced by dynamic thresholding and cultural adaptation. Experiments across 10 languages show significant improvements over baselines, achieving 0.90 accuracy for English and 0.84 for low-resource languages. The approach also demonstrates 22% greater computational efficiency than traditional methods. Practical applications include real-time brand monitoring and cross-cultural discourse analysis. Future work will focus on optimization for low-resource languages via 8-bit quantization.

artificial intelligence, natural language, sentiment analysis, (17 more...)

arXiv.org Artificial Intelligence

2506.06929

Country: North America > Mexico (0.17)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

HalluSearch at SemEval-2025 Task 3: A Search-Enhanced RAG Pipeline for Hallucination Detection

Abdallah, Mohamed A., El-Beltagy, Samhaa R.

arXiv.org Artificial IntelligenceApr-15-2025

In this paper, we present HalluSearch, a multilingual pipeline designed to detect fabricated text spans in Large Language Model (LLM) outputs. Developed as part of Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes, HalluSearch couples retrieval-augmented verification with fine-grained factual splitting to identify and localize hallucinations in fourteen different languages. Empirical evaluations show that HalluSearch performs competitively, placing fourth in both English (within the top ten percent) and Czech. While the system's retrieval-based strategy generally proves robust, it faces challenges in languages with limited online coverage, underscoring the need for further research to ensure consistent hallucination detection across diverse linguistic contexts.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2504.10168

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

An Empirical Study of Causal Relation Extraction Transfer: Design and Data

Anuyah, Sydney, Vanschaik, Jack, Jain, Palak, Lehman, Sawyer, Chakraborty, Sunandan

arXiv.org Artificial IntelligenceMar-8-2025

We conduct an empirical analysis of neural network architectures and data transfer strategies for causal relation extraction. By conducting experiments with various contextual embedding layers and architectural components, we show that a relatively straightforward BioBERT-BiGRU relation extraction model generalizes better than other architectures across varying web-based sources and annotation strategies. Furthermore, we introduce a metric for evaluating transfer performance, $F1_{phrase}$ that emphasizes noun phrase localization rather than directly matching target tags. Using this metric, we can conduct data transfer experiments, ultimately revealing that augmentation with data with varying domains and annotation styles can improve performance. Data augmentation is especially beneficial when an adequate proportion of implicitly and explicitly causal sentences are included.

dataset, extraction, relation extraction, (13 more...)

arXiv.org Artificial Intelligence

2503.06076

Country: North America > United States > Indiana > Marion County > Indianapolis (0.04)

Genre: Research Report > Experimental Study (0.47)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Robust AI-Generated Text Detection by Restricted Embeddings

Kuznetsov, Kristian, Tulchinskii, Eduard, Kushnareva, Laida, Magai, German, Barannikov, Serguei, Nikolenko, Sergey, Piontkovskaya, Irina

arXiv.org Artificial IntelligenceOct-10-2024

Growing amount and quality of AI-generated texts makes detecting such content more difficult. In most real-world scenarios, the domain (style and topic) of generated data and the generator model are not known in advance. In this work, we focus on the robustness of classifier-based detectors of AI-generated text, namely their ability to transfer to unseen generators or semantic domains. We investigate the geometry of the embedding space of Transformer-based text encoders and show that clearing out harmful linear subspaces helps to train a robust classifier, ignoring domain-specific spurious features. We investigate several subspace decomposition and feature selection strategies and achieve significant improvements over state of the art methods in cross-domain and cross-generator transfer. Our best approaches for head-wise and coordinate-based subspace removal increase the mean out-of-distribution (OOD) classification score by up to 9% and 14% in particular setups for RoBERTa and BERT embeddings respectively. We release our code and data: https://github.com/SilverSolver/RobustATD

accuracy, computational linguistic, dataset, (16 more...)

arXiv.org Artificial Intelligence

2410.08113

Country:

Asia > Russia (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
(8 more...)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Uncovering Differences in Persuasive Language in Russian versus English Wikipedia

Li, Bryan, Panasyuk, Aleksey, Callison-Burch, Chris

arXiv.org Artificial IntelligenceSep-27-2024

We study how differences in persuasive language across Wikipedia articles, written in either English and Russian, can uncover each culture's distinct perspective on different subjects. We develop a large language model (LLM) powered system to identify instances of persuasive language in multilingual texts. Instead of directly prompting LLMs to detect persuasion, which is subjective and difficult, we propose to reframe the task to instead ask high-level questions (HLQs) which capture different persuasive aspects. Importantly, these HLQs are authored by LLMs themselves. LLMs over-generate a large set of HLQs, which are subsequently filtered to a small set aligned with human labels for the original task. We then apply our approach to a large-scale, bilingual dataset of Wikipedia articles (88K total), using a two-stage identify-then-extract prompting strategy to find instances of persuasion. We quantify the amount of persuasion per article, and explore the differences in persuasion through several experiments on the paired articles. Notably, we generate rankings of articles by persuasion in both languages. These rankings match our intuitions on the culturally-salient subjects; Russian Wikipedia highlights subjects on Ukraine, while English Wikipedia highlights the Middle East. Grouping subjects into larger topics, we find politically-related events contain more persuasion than others. We further demonstrate that HLQs obtain similar performance when posed in either English or Russian. Our methodology enables cross-lingual, cross-cultural understanding at scale, and we release our code, prompts, and data.

large language model, machine learning, persuasion, (21 more...)

arXiv.org Artificial Intelligence

2409.19148

Country:

Asia > Russia (0.69)
Europe > Middle East (0.24)
Africa > Middle East (0.24)
(7 more...)

Genre: Research Report (0.70)

Industry:

Media (1.00)
Law Enforcement & Public Safety (0.68)
Government > Regional Government > North America Government > United States Government (0.67)
(2 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

E2MoCase: A Dataset for Emotional, Event and Moral Observations in News Articles on High-impact Legal Cases

Greco, Candida M., Zangari, Lorenzo, Picca, Davide, Tagarelli, Andrea

arXiv.org Artificial IntelligenceSep-13-2024

The way media reports on legal cases can significantly shape public opinion, often embedding subtle biases that influence societal views on justice and morality. Analyzing these biases requires a holistic approach that captures the emotional tone, moral framing, and specific events within the narratives. In this work we introduce E2MoCase, a novel dataset designed to facilitate the integrated analysis of emotions, moral values, and events within legal narratives and media coverage. By leveraging advanced models for emotion detection, moral value identification, and event extraction, E2MoCase offers a multi-dimensional perspective on how legal cases are portrayed in news articles.

e2mocase, emotion, paragraph, (17 more...)

arXiv.org Artificial Intelligence

2409.09001

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
North America > United States > Texas > Dallas County > Dallas (0.04)
(7 more...)

Genre: Research Report (0.82)

Industry:

Media > News (1.00)
Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Communications (0.94)

Add feedback

TM-TREK at SemEval-2024 Task 8: Towards LLM-Based Automatic Boundary Detection for Human-Machine Mixed Text

Qu, Xiaoyan, Meng, Xiangfeng

arXiv.org Artificial IntelligenceMar-31-2024

With the increasing prevalence of text generated by large language models (LLMs), there is a growing concern about distinguishing between LLM-generated and human-written texts in order to prevent the misuse of LLMs, such as the dissemination of misleading information and academic dishonesty. Previous research has primarily focused on classifying text as either entirely human-written or LLM-generated, neglecting the detection of mixed texts that contain both types of content. This paper explores LLMs' ability to identify boundaries in human-written and machine-generated mixed texts. We approach this task by transforming it into a token classification problem and regard the label turning point as the boundary. Notably, our ensemble model of LLMs achieved first place in the 'Human-Machine Mixed Text Detection' sub-task of the SemEval'24 Competition Task 8. Additionally, we investigate factors that influence the capability of LLMs in detecting boundaries within mixed texts, including the incorporation of extra layers on top of LLMs, combination of segmentation loss, and the impact of pretraining. Our findings aim to provide valuable insights for future research in this area.

boundary detection, detection, llm, (13 more...)

arXiv.org Artificial Intelligence

2404.00899

Country:

North America > Mexico (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.85)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges

Winata, Genta Indra, Aji, Alham Fikri, Yong, Zheng-Xin, Solorio, Thamar

arXiv.org Artificial IntelligenceMay-24-2023

Code-Switching, a common phenomenon in written text and conversation, has been studied over decades by the natural language processing (NLP) research community. Initially, code-switching is intensively explored by leveraging linguistic theories and, currently, more machine-learning oriented approaches to develop models. We introduce a comprehensive systematic survey on code-switching research in natural language processing to understand the progress of the past decades and conceptualize the challenges and tasks on the code-switching topic. Finally, we summarize the trends and findings and conclude with a discussion for future direction and open questions for further investigation.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2212.0966

Country:

Asia > Indonesia > Bali (0.05)
Asia > East Asia (0.04)
South America > Brazil (0.04)
(12 more...)

Genre: Overview (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
(4 more...)

Add feedback

UniCausal: Unified Benchmark and Repository for Causal Text Mining

Tan, Fiona Anting, Zuo, Xinyu, Ng, See-Kiong

arXiv.org Artificial IntelligenceApr-14-2023

Current causal text mining datasets vary in objectives, data coverage, and annotation schemes. These inconsistent efforts prevent modeling capabilities and fair comparisons of model performance. Furthermore, few datasets include cause-effect span annotations, which are needed for end-to-end causal relation extraction. To address these issues, we propose UniCausal, a unified benchmark for causal text mining across three tasks: (I) Causal Sequence Classification, (II) Cause-Effect Span Detection and (III) Causal Pair Classification. We consolidated and aligned annotations of six high quality, mainly human-annotated, corpora, resulting in a total of 58,720, 12,144 and 69,165 examples for each task respectively. Since the definition of causality can be subjective, our framework was designed to allow researchers to work on some or all datasets and tasks. To create an initial benchmark, we fine-tuned BERT pre-trained language models to each task, achieving 70.10% Binary F1, 52.42% Macro F1, and 84.68% Binary F1 scores respectively.

data mining, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2208.09163

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
(19 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining > Text Mining (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback