Why (and When) does Local SGD Generalize Better than SGD?