On the Trade-off between Flatness and Optimization in Distributed Learning

Open in new window