ea89621bee7c88b2c5be6681c8ef4906-AuthorFeedback.pdf

Neural Information Processing Systems 

In contrast, we use 10% of the training set9 for validation, and treat the validation set as apurely held-out test set (this also means that we train on less data).10 Wewillexplainthismoreclearly.30 both spheres are sufficiently tiny (i.e.