Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

Open in new window