AnalyzingLotteryTicketHypothesisfrom PAC-BayesianTheoryPerspective
–Neural Information Processing Systems
However,sincetheinitial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability.
Neural Information Processing Systems
Feb-11-2026, 21:18:21 GMT