Optimizing Information-theoretical Generalization Bounds via Anisotropic Noise in SGLD