Elo Uncovered: Robustness and Best Practices in Language Model Evaluation

Open in new window