Temperature check: theory and practice for training models with softmax-cross-entropy losses