Recommendations and Reporting Checklist for Rigorous & Transparent Human Baselines in Model Evaluations

Open in new window