EvalCards: A Framework for Standardized Evaluation Reporting

Dhar, Ruchira, Villegas, Danae Sanchez, Karamolegkou, Antonia, Schiavone, Alice, Yuan, Yifei, Chen, Xinyi, Li, Jiaang, Frank, Stella, De Grazia, Laura, Swain, Monorama, Brandl, Stephanie, Hershcovich, Daniel, Søgaard, Anders, Elliott, Desmond

Dec-1-2025–arXiv.org Artificial Intelligence

Evaluation has long been a central concern in NLP, and transparent reporting practices are more critical than ever in today's landscape of rapidly released open-access models. Drawing on a survey of recent work on evaluation and documentation, we identify three persistent shortcomings in current reporting practices: reproducibility, accessibility, and governance. We argue that existing standardization efforts remain insufficient and introduce Evaluation Disclosure Cards (EvalCards) as a path forward. EvalCards are designed to enhance transparency for both researchers and practitioners while providing a practical foundation to meet emerging governance requirements.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Dec-1-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.68)
- Asia > Middle East (0.46)
- Europe > Austria
  - Vienna (0.14)

Genre:
- Overview (1.00)

Industry:
- Law (1.00)
- Health & Medicine (1.00)
- Government (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Issues > Social & Ethical Issues (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.93)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found