STaR: Self-TaughtReasoner BootstrappingReasoningWithReasoning

Neural Information Processing Systems 

For example, [5] demonstrated that LLMs explicitly trained to use "scratchpads" for intermediate steps can attain perfect in-distribution performance on arithmetic, and strong out-of-distribution generalization, while models trained topredict answers directly fail to do either.