AITopics | data leakage

Collaborating Authors

data leakage

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

e48880ea81caa7836e6a0694049093ae-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 16:07:46 GMT

artificial intelligence, core feature, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
North America > United States > Virginia (0.04)
South America > Brazil (0.04)
(4 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)
Banking & Finance > Insurance (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

EvoCodeBench: An Evolving Code Generation Benchmark with Domain-Specific Evaluations

Neural Information Processing SystemsFeb-15-2026, 14:02:09 GMT

How to evaluate Large Language Models (LLMs) in code generation remains an open question.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.14)
Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Law (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

4eb32e1569085c8f8883163665bf3c0a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 21:22:57 GMT

artificial intelligence, leakage, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > Greece (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(2 more...)

Add feedback

2f8ee6a3d766b426d2618e555b5aeb39-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 15:01:09 GMT

arxiv preprint arxiv, benchmark, vlm, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Nebraska (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

EvoCodeBench: An Evolving Code Generation Benchmark with Domain-Specific Evaluations

Neural Information Processing SystemsDec-26-2025, 06:36:31 GMT

How to evaluate Large Language Models (LLMs) in code generation remains an open question. Many benchmarks have been proposed, but they have two limitations, i.e., data leakage and lack of domain-specific evaluation.The former hurts the fairness of benchmarks, and the latter hinders practitioners from selecting superior LLMs for specific programming domains.To address these two limitations, we propose a new benchmark - EvoCodeBench, which has the following advances: (1) Evolving data. EvoCodeBench will be dynamically updated every period (e.g., 6 months) to avoid data leakage. This paper releases the first version - EvoCodeBench-2403, containing 275 samples from 25 repositories.(2)

artificial intelligence, large language model, natural language, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Are We on the Right Way for Evaluating Large Vision-Language Models?

Neural Information Processing SystemsDec-24-2025, 19:17:46 GMT

Large vision-language models (LVLMs) have recently achieved rapid progress, sparking numerous studies to evaluate their multi-modal capabilities. However, we dig into current evaluation works and identify two primary issues: 1) Visual content is unnecessary for many samples. The answers can be directly inferred from the questions and options, or the world knowledge embedded in LLMs. This phenomenon is prevalent across current benchmarks. For instance, GeminiPro achieves 42.7% on the MMMU benchmark without any visual input, and outperforms the random choice baseline across six benchmarks near 24% on average.

artificial intelligence, large language model, natural language, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.60)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.33)

Add feedback

Revisiting Pre-trained Language Models for Vulnerability Detection

Li, Youpeng, Qi, Weiliang, Wang, Xuyu, Yu, Fuxun, Wang, Xinda

arXiv.org Artificial IntelligenceNov-25-2025

The rapid advancement of pre-trained language models (PLMs) has demonstrated promising results for various code-related tasks. However, their effectiveness in detecting real-world vulnerabilities remains a critical challenge. While existing empirical studies evaluate PLMs for vulnerability detection (VD), they suffer from data leakage, limited scope, and superficial analysis, hindering the accuracy and comprehensiveness of evaluations. This paper begins by revisiting the common issues in existing research on PLMs for VD through the evaluation pipeline. It then proceeds with an accurate and extensive evaluation of 18 PLMs on high-quality datasets that feature accurate labeling, diverse vulnerability types, and various projects. Specifically, we compare the performance of PLMs under both fine-tuning and prompt engineering, assess their effectiveness and generalizability across various training and testing settings, and analyze their robustness to a series of perturbations. Our findings reveal that PLMs incorporating pre-training tasks designed to capture the syntactic and semantic patterns of code outperform both general-purpose PLMs and those solely pre-trained or fine-tuned on large code corpora. However, these models face notable challenges in real-world scenarios, such as difficulties in detecting vulnerabilities with complex dependencies, handling perturbations introduced by code normalization and abstraction, and identifying semantic-preserving vulnerable code transformations. Also, the truncation caused by the limited context windows of PLMs can lead to a non-negligible number of labeling errors, which is overlooked by previous work. This study underscores the importance of thorough evaluations of model performance in practical scenarios and outlines future directions to help enhance the effectiveness of PLMs for realistic VD applications.

large language model, machine learning, plm, (21 more...)

arXiv.org Artificial Intelligence

2507.16887

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > India > Karnataka > Bengaluru (0.05)
(28 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Data Leakage and Deceptive Performance: A Critical Examination of Credit Card Fraud Detection Methodologies

Hayat, Khizar, Magnier, Baptiste

arXiv.org Artificial IntelligenceNov-11-2025

This study critically examines the methodological rigor in credit card fraud detection research, revealing how fundamental evaluation flaws can overshadow algorithmic sophistication. Through deliberate experimentation with improper evaluation protocols, we demonstrate that even simple models can achieve deceptively impressive results when basic methodological principles are violated. Our analysis identifies four critical issues plaguing current approaches: (1) pervasive data leakage from improper preprocessing sequences, (2) intentional vagueness in methodological reporting, (3) inadequate temporal validation for transaction data, and (4) metric manipulation through recall optimization at precision's expense. We present a case study showing how a minimal neural network architecture with data leakage outperforms many sophisticated methods reported in literature, achieving 99.9\% recall despite fundamental evaluation flaws. These findings underscore that proper evaluation methodology matters more than model complexity in fraud detection research. The study serves as a cautionary example of how methodological rigor must precede architectural sophistication, with implications for improving research practices across machine learning applications.

artificial intelligence, deep learning, machine learning, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.3390/math13162563

2506.02703

Country: