Optimal Condition for Initialization Variance in Deep Neural Networks: An SGD Dynamics Perspective