Assessing the Impact of Prompting, Persona, and Chain of Thought Methods on ChatGPT's Arithmetic Capabilities

Chen, Yuhao, Wong, Chloe, Yang, Hanwen, Aguenza, Juan, Bhujangari, Sai, Vu, Benthan, Lei, Xun, Prasad, Amisha, Fluss, Manny, Phuong, Eric, Liu, Minghao, Davis, James

arXiv.org Artificial Intelligence 

Large language models, such as ChatGPT, represent a transformative development in the field of Machine Learning. Demonstrating remarkable proficiency in generating coherent responses, these models effectively address intricate challenges, including mathematical problem-solving. To improve accuracy, researchers and practitioners have explored various methodologies, with prompting, persona, and Chain of Thoughts emerging as significant strategies aimed at augmenting ChatGPT's performance. This study's primary objective was to benchmark ChatGPT's default arithmetic capabilities and compare it to the performance of ChatGPT when utilizing prompting, persona, and Chain of Thoughts methods. Prompting involves providing specific instructions or questions to guide a language model's response generation. Persona refers to the creation of a fictional character with a distinct personality, whose perspective is utilized to generate responses. Chain of Thoughts involves the sequential connection of ideas or concepts to guide response generation. To assess the arithmetic capabilities of ChatGPT, we used three distinct datasets: MATH[1], GSM8K[2], and MMLU[3]. Each of these datasets presented a range of mathematical problems across multiple domains and difficulty levels.