Large scale distributed neural network training through online distillation