Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation