Plateau Phenomenon in Gradient Descent Training of ReLU networks: Explanation, Quantification and Avoidance