Researchers find that large language models struggle with math