Evaluating Latent Knowledge of Public Tabular Datasets in Large Language Models

Silvestri, Matteo, Giorgi, Flavio, Silvestri, Fabrizio, Tolomei, Gabriele

Oct-24-2025–arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly evaluated on their ability to reason over structured data, yet such assessments often overlook a crucial confound: dataset contamination. In this work, we investigate whether LLMs exhibit prior knowledge of widely used tabular benchmarks such as Adult Income, Titanic, and others. Through a series of controlled probing experiments, we reveal that contamination effects emerge exclusively for datasets containing strong semantic cues-for instance, meaningful column names or interpretable value categories. In contrast, when such cues are removed or randomized, performance sharply declines to near-random levels. These findings suggest that LLMs' apparent competence on tabular reasoning tasks may, in part, reflect memorization of publicly available datasets rather than genuine generalization. We discuss implications for evaluation protocols and propose strategies to disentangle semantic leakage from authentic reasoning ability in future LLM assessments.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

Oct-24-2025

arXiv.org PDF

Add feedback

Country:
- North America (0.28)

Genre:
- Research Report
  - Experimental Study (0.67)
  - New Finding (0.67)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found