Goto

Collaborating Authors

 Asia







Characterization of Overfitting in Robust Multiclass Classification

Neural Information Processing Systems

Nonetheless, modern machine learning is adaptive in its nature. Prior information about a model's performance on the test set inevitably influences




Elo Uncovered: Robustness and Best Practices in Language Model Evaluation

Neural Information Processing Systems

However, while popular, the system's suitability for assessing entities with constant skill levels, such as LLMs, remains relatively unexplored. We study two fundamental axioms that evaluation methods should adhere to: reliability and transitivity .