PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations
–Neural Information Processing Systems
Expert-designed close-ended benchmarks are indispensable in assessing the knowledge capacity of large language models (LLMs). Despite their widespread use, concerns have mounted regarding their reliability due to limited test scenarios and an unavoidable risk of data contamination. To rectify this, we present PertEval, a toolkit devised for in-depth probing of LLMs' knowledge capacity through knowledge-invariant perturbations
Neural Information Processing Systems
Dec-24-2025, 01:28:08 GMT
- Technology: