What should an AI assessor optimise for?

Romero-Alvarado, Daniel, Martínez-Plumed, Fernando, Hernández-Orallo, José

Feb-1-2025–arXiv.org Artificial Intelligence

An AI assessor is an external, ideally indepen-dent system that predicts an indicator, e.g., a loss value, of another AI system. Assessors can lever-age information from the test results of many other AI systems and have the flexibility of be-ing trained on any loss function or scoring rule: from squared error to toxicity metrics. Here we address the question: is it always optimal to train the assessor for the target metric? Or could it be better to train for a different metric and then map predictions back to the target metric? Us-ing twenty regression and classification problems with tabular data, we experimentally explore this question for, respectively, regression losses and classification scores with monotonic and non-monotonic mappings and find that, contrary to intuition, optimising for more informative met-rics is not generally better. Surprisingly, some monotonic transformations are promising. For example, the logistic loss is useful for minimis-ing absolute or quadratic errors in regression, and the logarithmic score helps maximise quadratic or spherical scores in classification.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Feb-1-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.46)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Health & Medicine > Therapeutic Area (0.46)
- Leisure & Entertainment > Games (0.35)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Performance Analysis > Accuracy (0.46)
    - Statistical Learning > Regression (0.46)
  - Natural Language (1.00)