Appendix A Model and training procedure: details

Neural Information Processing Systems 

All experiments used the same model and training procedure, unless stated otherwise. ResNet with two blocks per group and channels per group (16, 32, 32, 64), and which was not pre-trained. The integer labels were embedded using a standard embedding layer. In all figures, (shaded) error bars indicate standard deviation around the mean. However, as future extensions, it would be possible to extend the model to handle novel labels as well.