PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations

Dec-24-2025, 01:28:08 GMT–Neural Information Processing Systems

Expert-designed close-ended benchmarks are indispensable in assessing the knowledge capacity of large language models (LLMs). Despite their widespread use, concerns have mounted regarding their reliability due to limited test scenarios and an unavoidable risk of data contamination. To rectify this, we present PertEval, a toolkit devised for in-depth probing of LLMs' knowledge capacity through knowledge-invariant perturbations

artificial intelligence, large language model, natural language, (8 more...)

Neural Information Processing Systems

Dec-24-2025, 01:28:08 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)