Appendix A Loss aware w

Neural Information Processing Systems 

All reported results were computed on the test dataset for models with the best validation loss over the 100 epochs of training (models being validated at the end of each epoch).