A metrological framework for uncertainty evaluation in machine learning classification models