TowardsTheoreticallyUnderstandingWhySGD GeneralizesBetterThanADAMinDeepLearning (SupplementaryFile)