AITopics

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
(2 more...)

Neural Information Processing SystemsDec-24-2025, 15:21:20 GMT

RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content

Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data includes encyclopedic documents that harbor a vast amount of general knowledge (, Wikipedia) but also potentially overlap with benchmark datasets used for evaluating LLMs. Consequently, evaluating models on test splits that might have leaked into the training set is prone to misleading conclusions. To foster sound evaluation of language models, we introduce a new test dataset named RepLiQA, suited for question-answering and topic retrieval tasks. RepLiQA is a collection of five splits of test sets, four of which have not been released to the internet or exposed to LLM APIs prior to this publication. Each sample in RepLiQA comprises (1) a reference document crafted by a human annotator and depicting an imaginary scenario (, a news article) absent from the internet; (2) a question about the document's topic; (3) a ground-truth answer derived directly from the information in the document; and (4) the paragraph extracted from the reference document containing the answer. As such, accurate answers can only be generated if a model can find relevant content within the provided document. We run a large-scale benchmark comprising several state-of-the-art LLMs to uncover differences in performance across models of various types and sizes in a context-conditional language modeling setting.

artificial intelligence, large language model, natural language, (9 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Neural Information Processing SystemsNov-15-2025, 07:04:27 GMT

QA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content João Monteiro,3, Pierre-André Noël

Consequently, evaluating models on test splits that might have leaked into the training set is prone to misleading conclusions.

dataset, information, reference document, (16 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > Singapore (0.04)
(9 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Education (1.00)
Government (0.93)
Information Technology > Security & Privacy (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-10-2025, 05:32:30 GMT

6e4cdfdd909ea4e34bfc85a12774cba0-Paper-Conference.pdf

annotator, arxiv preprint arxiv, hallucination, (15 more...)

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Neural Information Processing SystemsOct-9-2025, 21:49:31 GMT

QA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content João Monteiro,3, Pierre-André Noël

Consequently, evaluating models on test splits that might have leaked into the training set is prone to misleading conclusions.

dataset, information, reference document, (15 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > Singapore (0.04)
(9 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Education (1.00)
Government (0.93)
Information Technology > Security & Privacy (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceSep-23-2025

From Documents to Database: Failure Modes for Industrial Assets

Kabakci-Zorlu, Duygu, Lorenzi, Fabio, Sheehan, John, Lynch, Karol, Eck, Bradley

We propose an interactive system using foundation models and user-provided technical documents to generate Failure Mode and Effects Analyses (FMEA) for industrial equipment. Our system aggregates unstructured content across documents to generate an FMEA and stores it in a relational database. Leveraging this tool, the time required for creation of this knowledge-intensive content is reduced, outperforming traditional manual approaches. This demonstration showcases the potential of foundation models to facilitate the creation of specialized structured content for enterprise asset management systems.

failure location, large language model, natural language, (17 more...)

2509.17834

Country: Europe (0.14)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.56)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.49)

arXiv.org Artificial IntelligenceJun-3-2025

Towards Multi-dimensional Evaluation of LLM Summarization across Domains and Languages

Min, Hyangsuk, Lee, Yuho, Ban, Minjeong, Deng, Jiaqi, Kim, Nicole Hee-Yeon, Yun, Taewon, Su, Hang, Cai, Jason, Song, Hwanjun

Evaluation frameworks for text summarization have evolved in terms of both domain coverage and metrics. However, existing benchmarks still lack domain-specific assessment criteria, remain predominantly English-centric, and face challenges with human annotation due to the complexity of reasoning. To address these, we introduce MSumBench, which provides a multi-dimensional, multi-domain evaluation of summarization in English and Chinese. It also incorporates specialized assessment criteria for each domain and leverages a multi-agent debate system to enhance annotation quality. By evaluating eight modern summarization models, we discover distinct performance patterns across domains and languages. We further examine large language models as summary evaluators, analyzing the correlation between their evaluation and summarization capabilities, and uncovering systematic bias in their assessment of self-generated summaries. Our benchmark dataset is publicly available at https://github.com/DISL-Lab/MSumBench.

large language model, machine learning, reference document, (23 more...)

2506.00549

Country: North America > United States (0.67)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.67)
Government > Regional Government > North America Government > United States Government (0.67)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsMay-26-2025, 19:47:42 GMT

RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content

artificial intelligence, large language model, natural language, (7 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceMar-26-2025

Scaling Laws of Synthetic Data for Language Models

Qin, Zeyu, Dong, Qingxiu, Zhang, Xingxing, Dong, Li, Huang, Xiaolong, Yang, Ziyi, Khademi, Mahmoud, Zhang, Dongdong, Awadalla, Hany Hassan, Fung, Yi R., Chen, Weizhu, Cheng, Minhao, Wei, Furu

Large language models (LLMs) achieve strong performance across diverse tasks, largely driven by high-quality web data used in pre-training. However, recent studies indicate this data source is rapidly depleting. Synthetic data emerges as a promising alternative, but it remains unclear whether synthetic datasets exhibit predictable scalability comparable to raw pre-training data. In this work, we systematically investigate the scaling laws of synthetic data by introducing SynthLLM, a scalable framework that transforms pre-training corpora into diverse, high-quality synthetic datasets. Our approach achieves this by automatically extracting and recombining high-level concepts across multiple documents using a graph algorithm. Key findings from our extensive mathematical experiments on SynthLLM include: (1) SynthLLM generates synthetic data that reliably adheres to the rectified scaling law across various model sizes; (2) Performance improvements plateau near 300B tokens; and (3) Larger models approach optimal performance with fewer training tokens. For instance, an 8B model peaks at 1T tokens, while a 3B model requires 4T. Moreover, comparisons with existing synthetic data generation and augmentation methods demonstrate that SynthLLM achieves superior performance and scalability. Our findings highlight synthetic data as a scalable and reliable alternative to organic pre-training corpora, offering a viable path toward continued improvement in model performance.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

2503.19551

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > District of Columbia > Washington (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Eberhardinger, Manuel, Takenaka, Patrick, Grießhaber, Daniel, Maucher, Johannes

Anonymization of Documents for Law Enforcement with Machine Learning

arXiv.org Artificial IntelligenceJan-13-2025

The steadily increasing utilization of data-driven methods and approaches in areas that handle sensitive personal information such as in law enforcement mandates an ever increasing effort in these institutions to comply with data protection guidelines. In this work, we present a system for automatically anonymizing images of scanned documents, reducing manual effort while ensuring data protection compliance. Our method considers the viability of further forensic processing after anonymization by minimizing automatically redacted areas by combining automatic detection of sensitive regions with knowledge from a manually anonymized reference document. Using a self-supervised image model for instance retrieval of the reference document, our approach requires only one anonymized example to efficiently redact all documents of the same type, significantly reducing processing time. We show that our approach outperforms both a purely automatic redaction system and also a naive copy-paste scheme of the reference anonymization to other documents on a hand-crafted dataset of ground truth redactions.

machine learning, pattern recognition, reference document, (17 more...)

2501.07334

Country: Europe > Germany > Baden-Württemberg (0.16)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)