Data Parallelism and Distributed Deep Learning at production scale (part 2)