A mixed policy to improve performance of language models on math problems