AITopics | language resource and evaluation

Collaborating Authors

language resource and evaluation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ce9e92e3de2372a4b93353eb7f3dc0bd-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-19-2026, 12:00:31 GMT

computational linguistic, corpus, dataset, (11 more...)

Neural Information Processing Systems

Country:

Europe > Slovenia (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
(29 more...)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
(4 more...)

Add feedback

Findings of the Fourth Shared Task on Multilingual Coreference Resolution: Can LLMs Dethrone Traditional Approaches?

Novák, Michal, Konopík, Miloslav, Nedoluzhko, Anna, Popel, Martin, Pražák, Ondřej, Sido, Jakub, Straka, Milan, Žabokrtský, Zdeněk, Zeman, Daniel

arXiv.org Artificial IntelligenceNov-7-2025

The paper presents an overview of the fourth edition of the Shared Task on Multilingual Coreference Resolution, organized as part of the CODI-CRAC 2025 workshop. As in the previous editions, participants were challenged to develop systems that identify mentions and cluster them according to identity coreference. A key innovation of this year's task was the introduction of a dedicated Large Language Model (LLM) track, featuring a simplified plaintext format designed to be more suitable for LLMs than the original CoNLL-U representation. The task also expanded its coverage with three new datasets in two additional languages, using version 1.3 of CorefUD - a harmonized multilingual collection of 22 datasets in 17 languages. In total, nine systems participated, including four LLM-based approaches (two fine-tuned and two using few-shot adaptation). While traditional systems still kept the lead, LLMs showed clear potential, suggesting they may soon challenge established approaches in future editions.

artificial intelligence, large language model, natural language, (12 more...)

arXiv.org Artificial Intelligence

2509.17796

Country:

Europe > Czechia (0.46)
Europe > France (0.28)
Europe > Germany (0.28)
(3 more...)

Genre:

Overview (0.86)
Research Report (0.81)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

ParsTranslit: Truly Versatile Tajik-Farsi Transliteration

Merchant, Rayyan, Tang, Kevin

arXiv.org Artificial IntelligenceOct-10-2025

As a digraphic language, the Persian language utilizes two written standards: Perso-Arabic in Afghanistan and Iran, and Tajik-Cyrillic in Tajikistan. Despite the significant similarity between the dialects of each country, script differences prevent simple one-to-one mapping, hindering written communication and interaction between Tajikistan and its Persian-speaking ``siblings''. To overcome this, previously-published efforts have investigated machine transliteration models to convert between the two scripts. Unfortunately, most efforts did not use datasets other than those they created, limiting these models to certain domains of text such as archaic poetry or word lists. A truly usable transliteration system must be capable of handling varied domains, meaning that suck models lack the versatility required for real-world usage. The contrast in domain between data also obscures the task's true difficulty. We present a new state-of-the-art sequence-to-sequence model for Tajik-Farsi transliteration trained across all available datasets, and present two datasets of our own. Our results across domains provide clearer understanding of the task, and set comprehensive comparable leading benchmarks. Overall, our model achieves chrF++ and Normalized CER scores of 87.91 and 0.05 from Farsi to Tajik and 92.28 and 0.04 from Tajik to Farsi. Our model, data, and code are available at https://anonymous.4open.science/r/ParsTranslit-FB30/.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.0752

Country:

North America > United States > Minnesota (0.28)
Asia > Middle East > UAE (0.28)
Asia > Middle East > Iran (0.24)

Genre: Research Report > New Finding (0.48)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.46)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

LLM Hallucination Detection: HSAD

Li, JinXin, Tu, Gang, Hu, JunJie

arXiv.org Artificial IntelligenceOct-9-2025

Although Large Language Models have demonstrated powerful capabilities in a wide range of tasks such as language understanding and code generation, the frequent occurrence of hallucinations during the generation process has become a significant impediment to their deployment in critical application scenarios. Current mainstream hallucination detection methods rely on factual consistency verification or static hidden layer features. The former is constrained by the scope of knowledge coverage, while the latter struggles to capture reasoning biases during the inference process. To address these issues, and inspired by signal analysis methods in cognitive neuroscience, this paper proposes a hallucination detection method based on the frequency-domain analysis of hidden layer temporal signals, named HSAD (\textbf{H}idden \textbf{S}ignal \textbf{A}nalysis-based \textbf{D}etection). First, by treating the LLM's reasoning process as a cognitive journey that unfolds over time, we propose modeling and simulating the human process of signal perception and discrimination in a deception-detection scenario through hidden layer temporal signals. Next, The Fast Fourier Transform is applied to map these temporal signals into the frequency domain to construct spectral features, which are used to capture anomalies that arise during the reasoning process; analysis experiments on these spectral features have proven the effectiveness of this approach. Finally, a hallucination detection algorithm is designed based on these spectral features to identify hallucinations in the generated content. By effectively combining the modeling of the reasoning process with frequency-domain feature extraction, the HSAD method overcomes the limitations of existing approaches in terms of knowledge coverage and the detection of reasoning biases, demonstrating higher detection accuracy and robustness.

computational linguistic, large language model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2509.2358

Country:

North America > Canada (0.28)
Europe > Austria (0.28)
Asia > Middle East > UAE (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback

DeDisCo at the DISRPT 2025 Shared Task: A System for Discourse Relation Classification

Ju, Zhuoxuan, Wu, Jingni, Purushothama, Abhishek, Zeldes, Amir

arXiv.org Artificial IntelligenceSep-23-2025

This paper presents DeDisCo, Georgetown University's entry in the DISRPT 2025 shared task on discourse relation classification. We test two approaches, using an mt5-based encoder and a decoder based approach using the openly available Qwen model. We also experiment on training with augmented dataset for low-resource languages using matched data translated automatically from English, as well as using some additional linguistic features inspired by entries in previous editions of the Shared Task. Our system achieves a macro-accuracy score of 71.28, and we provide some interpretation and error analysis for our results.

computational linguistic, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.11498

Country:

Europe (1.00)
South America (0.67)
North America > United States > Maryland (0.28)
Asia > Japan > Honshū (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

CLaC at DISRPT 2025: Hierarchical Adapters for Cross-Framework Multi-lingual Discourse Relation Classification

Turk, Nawar, Comitogianni, Daniele, Kosseim, Leila

arXiv.org Artificial IntelligenceSep-23-2025

We present our submission to Task 3 (Discourse Relation Classification) of the DISRPT 2025 shared task. Task 3 introduces a unified set of 17 discourse relation labels across 39 corpora in 16 languages and six discourse frameworks, posing significant multilingual and cross-formalism challenges. We first benchmark the task by fine-tuning multilingual BERT-based models (mBERT, XLM-RoBERTa-Base, and XLM-RoBERTa-Large) with two argument-ordering strategies and progressive unfreezing ratios to establish strong baselines. We then evaluate prompt-based large language models (namely Claude Opus 4.0) in zero-shot and few-shot settings to understand how LLMs respond to the newly proposed unified labels. Finally, we introduce HiDAC, a Hierarchical Dual-Adapter Contrastive learning model. Results show that while larger transformer models achieve higher accuracy, the improvements are modest, and that unfreezing the top 75% of encoder layers yields performance comparable to full fine-tuning while training far fewer parameters. Prompt-based models lag significantly behind fine-tuned transformers, and HiDAC achieves the highest overall accuracy (67.5%) while remaining more parameter-efficient than full fine-tuning.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2509.16903

Country:

North America > United States (1.00)
Europe (1.00)
North America > Canada (0.69)
(2 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

UPRPRC: Unified Pipeline for Reproducing Parallel Resources -- Corpus from the United Nations

Lu, Qiuyang, Shen, Fangjian, Tang, Zhengkai, Liu, Qiang, Cheng, Hexuan, Liu, Hui, Wen, Wushao

arXiv.org Artificial IntelligenceSep-22-2025

The quality and accessibility of multilingual datasets are crucial for advancing machine translation. However, previous corpora built from United Nations documents have suffered from issues such as opaque process, difficulty of reproduction, and limited scale. To address these challenges, we introduce a complete end-to-end solution, from data acquisition via web scraping to text alignment. The entire process is fully reproducible, with a minimalist single-machine example and optional distributed computing steps for scalability. At its core, we propose a new Graph-Aided Paragraph Alignment (GAPA) algorithm for efficient and flexible paragraph-level alignment. The resulting corpus contains over 713 million English tokens, more than doubling the scale of prior work. To the best of our knowledge, this represents the largest publicly available parallel corpus composed entirely of human-translated, non-AI-generated content. Our code and corpus are accessible under the MIT License.

artificial intelligence, natural language, paragraph, (16 more...)

arXiv.org Artificial Intelligence

2509.15789

Country: Europe (0.93)

Genre: Research Report (0.40)

Industry: Government > Intergovernmental Programs (0.63)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

NADI 2025: The First Multidialectal Arabic Speech Processing Shared Task

Talafha, Bashar, Toyin, Hawau Olamide, Sullivan, Peter, Elmadany, AbdelRahim, Juma, Abdurrahman, Djanibekov, Amirbek, Zhang, Chiyu, Alshehhi, Hamad, Aldarmaki, Hanan, Jarrar, Mustafa, Habash, Nizar, Abdul-Mageed, Muhammad

arXiv.org Artificial IntelligenceSep-5-2025

We present the findings of the sixth Nuanced Arabic Dialect Identification (NADI 2025) Shared Task, which focused on Arabic speech dialect processing across three subtasks: spoken dialect identification (Subtask 1), speech recognition (Subtask 2), and diacritic restoration for spoken dialects (Subtask 3). A total of 44 teams registered, and during the testing phase, 100 valid submissions were received from eight unique teams. The distribution was as follows: 34 submissions for Subtask 1 "five teamsæ, 47 submissions for Subtask 2 "six teams", and 19 submissions for Subtask 3 "two teams". The best-performing systems achieved 79.8% accuracy on Subtask 1, 35.68/12.20 WER/CER (overall average) on Subtask 2, and 55/13 WER/CER on Subtask 3. These results highlight the ongoing challenges of Arabic dialect speech processing, particularly in dialect identification, recognition, and diacritic restoration. We also summarize the methods adopted by participating teams and briefly outline directions for future editions of NADI.

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.02038

Country:

Europe (1.00)
Africa > Middle East (0.68)
Asia > Middle East > UAE (0.46)
North America > United States > Minnesota (0.28)

Genre:

Overview (0.67)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Toward a Better Localization of Princeton WordNet

Freihat, Abed Alhakim

arXiv.org Artificial IntelligenceAug-26-2025

As Princeton WordNet continues to gain significance as a semantic lexicon in Natural Language Processing, the need for its localization and for ensuring the quality of this process has become increasingly critical. Existing efforts remain limited in both scale and rigor, and there is a notable absence of studies addressing the accuracy of localization or its alignment with the cultural context of Arabic. This paper proposes a structured framework for the localization of Princeton WordNet, detailing the stages and procedures required to achieve high-quality results without compromising cultural authenticity. We further present our experience in applying this framework, reporting outcomes from the localization of 10,000 synsets.

artificial intelligence, natural language, proceedings, (16 more...)

arXiv.org Artificial Intelligence

2508.18134

Country: Europe (0.93)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.91)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.67)

Add feedback

The BigScience ROOTS Corpus: A1.6TB Composite Multilingual Dataset

Neural Information Processing SystemsAug-19-2025, 00:51:25 GMT

As language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multilingual settings.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > Slovenia (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
(29 more...)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
(4 more...)

Add feedback