Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs