Towards Understanding Why Lookahead Generalizes Better Than SGD and Beyond (Supplementary File) Pan Zhou Hanshu Y an

Open in new window