How to tell if your model is over-fit using unlabeled data
In many settings, unlabeled data is plentiful (think images, text, etc), while sufficient labeled data for supervised learning might be harder to obtain. In these situations, it can be difficult to determine how well the model will generalize. Most methods for assessing model performance rely on labeled data alone, e.g. Without enough labeled data these can be unreliable. Is there anything more we can learn about the model's ability to generalize from unlabeled data? In this article, I demonstrate how unlabeled data can frequently be used to bound test loss.
Jun-28-2020, 18:40:20 GMT
- Technology: