Effects of label noise on the classification of outlier observations

de Farias, Matheus Vinícius Barreto, de Castro, Mario

arXiv.org Machine Learning 

The following study presents results obtained from experiments in which, before training a classification model, we added noise to the labels of the training set, so that the information contained in this set is not entirely correct. In fact, most datasets encountered in practical situations contain some degree of noise, which highlights the importance of this type of study for new techniques before implementing them in real-world applications. In this case, we are interested in measuring the impact of noise addition on BCOPS (Guan & Tib-shirani, 2022), a algorithm based on conformal prediction (V ovk et al., 2005) which, when combined with other machine learning methods, allows the construction of prediction sets for the test set observations in classification tasks. Prediction sets are sets that contain the possible values (for regression tasks) or possible classes (for classification tasks) for new observations. These sets are constructed so that the probability of the true value or class being contained within them meets a coverage guarantee. In the work developed by Guan & Tibshirani (2022), the possibility of using these prediction sets to detect outlier observations - meaning, observations whose true class was not present during training - is emphasized. Thus, we aim to measure both the classification coverage and the abstention rate on outlier observations of the BCOPS algorithm under the addition of noise, considering some of the datasets and machine learning algorithms used by Guan & Tibshirani (2022).