On the Opportunities of Large Language Models for Programming Process Data

Edwards, John, Hellas, Arto, Leinonen, Juho

arXiv.org Artificial Intelligence 

The level of detail of the feedback influences its effectiveness [80], and feedback can be given at many levels ranging from targeting how to work on and complete specific tasks to considering personal characteristics and behavior[26, 36, 59]. In teaching and learning programming, automated assessment systems have been a key tool for providing feedback at a scale already for more than a half a century [30, 36, 61]. Researchers have sought to automate step-by-step guidance [78], provide hints during the programming process [55], improve programming error messages [6], and aid in providing textual feedback by grouping similar code submissions together [23, 37, 58]. To support the understanding of how novices construct programs, researchers and educators have been collecting increasing amounts of data from students' programming process [31]. Such data can be collected at multiple granularities, ranging from final course assignment submissions to individual keystrokes from solving the assignments [31]. Programming process data has been, for example, used to play back how students construct their programs step by step or keystroke by keystroke to create a broader understanding of the process [27, 73, 83]. So far, despite shared efforts towards providing timely feedback to students[33], the potential of fine-grained programming process data for feedback purposes is still largely untapped. Large Language Models (LLMs) are a potential tool for realizing the transformation of programming process data into actionable feedback items. Within Computing Education Research, LLMs have broadened the horizon of what computing education researchers and practitioners can achieve[65], calling even for rethinking how computer science and programming is taught [16].