Does the evaluation stand up to evaluation? A first-principle approach to the evaluation of classifiers