Master your Metrics with Calibration
Siblini, Wissam, Fréry, Jordan, He-Guelton, Liyun, Oblé, Frédéric, Wang, Yi-Qing
Machine learning models deployed in real-world applications are often evaluated with precision-based metrics such as F1-score or AUC-PR (Area Under the Curve of Precision Recall). Heavily dependent on the class prior, such metrics may sometimes lead to wrong conclusions about the performance. For example, when dealing with non-stationary data streams, they do not allow the user to discern the reasons why a model performance varies across different periods. In this paper, we propose a way to calibrate the metrics so that they are no longer tied to the class prior. It corresponds to a readjustment, based on probabilities, to the value that the metric would have if the class prior was equal to a reference prior (user parameter). We conduct a large number of experiments on balanced and imbalanced data to assess the behavior of calibrated metrics and show that they improve interpretability and provide a better control over what is really measured. We describe specific real-world use-cases where calibration is beneficial such as, for instance, model monitoring in production, reporting, or fairness evaluation.
Sep-6-2019
- Country:
- Asia > Middle East > Jordan (0.04)
- Genre:
- Research Report (0.50)
- Industry:
- Law Enforcement & Public Safety > Fraud (0.48)
- Technology: