On the Benefits of Large Learning Rates for Kernel Methods