On Focal Loss for Class-Posterior Probability Estimation: A Theoretical Perspective