Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors

Phung, Tung, Pădurean, Victor-Alexandru, Cambronero, José, Gulwani, Sumit, Kohn, Tobias, Majumdar, Rupak, Singla, Adish, Soares, Gustavo

Jul-31-2023–arXiv.org Artificial Intelligence

Generative AI and large language models hold great promise in enhancing computing education by powering next-generation educational technologies for introductory programming. Recent works have studied these models for different scenarios relevant to programming education; however, these works are limited for several reasons, as they typically consider already outdated models or only specific scenario(s). Consequently, there is a lack of a systematic study that benchmarks state-of-the-art models for a comprehensive set of programming education scenarios. In our work, we systematically evaluate two models, ChatGPT (based on GPT-3.5) and GPT-4, and compare their performance with human tutors for a variety of scenarios. We evaluate using five introductory Python programming problems and real-world buggy programs from an online platform, and assess performance using expert-based annotations. Our results show that GPT-4 drastically outperforms ChatGPT (based on GPT-3.5) and comes close to human tutors' performance for several scenarios. These results also highlight settings where GPT-4 still struggles, providing exciting future directions on developing techniques to improve the performance of these models.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Jul-31-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.54)

Industry:
- Education > Educational Technology > Educational Software > Computer Based Training (0.81)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.85)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found