
Generalization performance is a central goal in machine learning, with explicit generalization strategies needed when training over-parametrized models, like large neural networks. There is growing interest in using multiple, potentially auxiliary tasks, as one strategy towards this goal.