Efficientmulti-promptevaluationofLLMs

Feb-9-2026, 19:31:56 GMT–Neural Information Processing Systems

Most popular benchmarks for comparing LLMs rely on alimited set ofprompt templates, which may not fully capture the LLMs' abilities and can affect the reproducibility ofresults onleaderboards. Manyrecent worksempirically verify prompt sensitivity and advocate for changes in LLM evaluation.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Feb-9-2026, 19:31:56 GMT

Conferences PDF

Add feedback

Country:
- Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks (1.00)

Duplicate Docs Excel Report

Title
Efficient multi-prompt evaluation of LLMs Felipe Maia Polo

Similar Docs Excel Report more

Title	Similarity	Source
None found