EvalCards: A Framework for Standardized Evaluation Reporting
Dhar, Ruchira, Villegas, Danae Sanchez, Karamolegkou, Antonia, Schiavone, Alice, Yuan, Yifei, Chen, Xinyi, Li, Jiaang, Frank, Stella, De Grazia, Laura, Swain, Monorama, Brandl, Stephanie, Hershcovich, Daniel, Søgaard, Anders, Elliott, Desmond
–arXiv.org Artificial Intelligence
Evaluation has long been a central concern in NLP, and transparent reporting practices are more critical than ever in today's landscape of rapidly released open-access models. Drawing on a survey of recent work on evaluation and documentation, we identify three persistent shortcomings in current reporting practices: reproducibility, accessibility, and governance. We argue that existing standardization efforts remain insufficient and introduce Evaluation Disclosure Cards (EvalCards) as a path forward. EvalCards are designed to enhance transparency for both researchers and practitioners while providing a practical foundation to meet emerging governance requirements.
arXiv.org Artificial Intelligence
Dec-1-2025
- Country:
- Asia
- Europe
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Germany > Baden-Württemberg
- Karlsruhe Region > Karlsruhe (0.04)
- Slovenia > Drava
- Municipality of Benedikt > Benedikt (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Austria > Vienna (0.14)
- Switzerland > Zürich
- Zürich (0.04)
- Ireland > Leinster
- North America
- Canada > Ontario
- Toronto (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.04)
- New Jersey (0.04)
- Florida > Miami-Dade County
- Canada > Ontario
- Genre:
- Overview (1.00)
- Industry:
- Government (1.00)
- Health & Medicine (1.00)
- Law (1.00)
- Technology: