AnalyzingLotteryTicketHypothesisfrom PAC-BayesianTheoryPerspective

Neural Information Processing Systems 

However,sincetheinitial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found