PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations
–Neural Information Processing Systems
Expert-designed close-ended benchmarks are indispensable in assessing the knowledge capacity of large language models (LLMs). Despite their widespread use, concerns have mounted regarding their reliability due to limited test scenarios and an unavoidable risk of data contamination. To rectify this, we present PertEval, a toolkit devised for in-depth probing of LLMs' knowledge capacity through knowledge-invariant perturbations. These perturbations employ human-like restatement techniques to generate on-the-fly test samples from static benchmarks, meticulously retaining knowledge-critical content while altering irrelevant details. Our toolkit further includes a suite of response consistency analyses that compare performance on raw vs. perturbed test sets to precisely assess LLMs' genuine knowledge capacity.
Neural Information Processing Systems
Mar-18-2025, 12:26:53 GMT
- Country:
- North America
- Mexico > Mexico City (0.14)
- United States
- California > San Francisco County
- San Francisco (0.14)
- New Mexico (0.14)
- California > San Francisco County
- North America
- Genre:
- Overview (0.67)
- Research Report > New Finding (1.00)
- Industry:
- Education (0.94)
- Information Technology (0.67)
- Technology: