A Experimental Details

Neural Information Processing Systems 

We train a variety of ResNets for comparing representations. We also train a wider ResNet-w2x and narrower ResNet-0.5x For experiments with changing label distribution, we also train the base ResNet-18. We perform the stitching on CIFAR-10. We consider a simple family of 5-layer CNNs, with four Conv-BatchNorm-ReLU-MaxPool layers and a fully-connected output layer following Page [2018].