Predicting the Generalization Gap in Deep Networks with Margin Distributions