Follow-the-Regularized-Leader (FTRL) is a powerful framework for various online learning problems. By designing its regularizer and learning rate to be adaptive to past observations, FTRL is known to work adaptively to various properties of an underlying environment.
In this work, we focus on mathematical problem-solving to explore enhancement of the mathematical reasoning abilities of pretrained LLMs. We investigate instruction tuning (Longpre et al., 2023; Wang