On Measuring and Quantifying Performance: Error Rates, Surrogate Loss, and an Example in SSL
Loog, Marco, Krijthe, Jesse H., Jensen, Are C.
The aim of semi-supervised learning is to improve supervised learners by exploiting potentially large amounts of, typically easier to obtain, unlabeled data [1]. Up to now, however, semi-supervised learners have reported mixed results when it comes to such improvements: it is not always the case that semi-supervision results in lower expected error rates. On the contrary, severely deteriorated performances have been observed in empirical studies and theory shows that improvement guarantees can often only be provided under rather stringent conditions [2-5]. Now, the principal suggestion put forward in this chapter is that, when dealing with semi-supervised learning, one may not only want to study the (expected) error rates these classifiers produce, but also to measure the classifiers' performances by means of the intrinsic loss they may be optimizing in the first place. That is, for classification routines that optimize a so-called surrogate loss at training time--which is what many machine learning and Bayesian decision theoretic approaches do [6, 7], we propose to also investigate how this loss behaves on the test set as this can provide us with an alternative view on the classifier's behavior that a mere error rate cannot capture. In fact, though the main example is concerned with semi-supervision, we would like to argue that in other learning scenarios, similar considerations might be beneficial. For instance in active learning [8], where rather than sampling randomly from ones input data to provide these instances with labels, one aims to do the sampling in a systematic way, trying to keep labeling cost as low as one can or, similarly, to learn from as few labeled examples as possible. Also here it may (or, we believe, it should) be of interest to not only compare the error rates that different approaches (e.g.
Jul-13-2017
- Country:
- North America > United States (1.00)
- Europe (1.00)
- Genre:
- Research Report (1.00)
- Industry:
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Technology: