Goto

Collaborating Authors

 Kiesler, Natalie


The Role of Generative AI in Software Student CollaborAItion

arXiv.org Artificial Intelligence

Collaboration is a crucial part of computing education. The increase Khan [28] has proposed an inspiring vision of how AI could in AI capabilities over the last couple of years is bound to profoundly help realize personalized individual tutors for every learner. Complementing affect all aspects of systems and software engineering, including this, an expert panel from 2020 [49] draws a scenario collaboration. In this position paper, we consider a scenario where where "AI supports orchestration of the multiple types of activities, AI agents would be able to take on any role in collaborative processes learning partners, and interaction patterns that can enrich a classroom". in computing education. We outline these roles, the activities We believe the possibilities are even broader, and to help and group dynamics that software development currently include, think about them, we propose a thought experiment that not only and discuss if and in what way AI could facilitate these roles and accommodates emerging practices and visions but also suggests activities. The goal of our work is to envision and critically examine new use cases in education that (to the best of our knowledge) have potential futures. We present scenarios suggesting how AI not yet been explored.


Beyond the Hype: A Comprehensive Review of Current Trends in Generative AI Research, Teaching Practices, and Tools

arXiv.org Artificial Intelligence

Generative AI (GenAI) is advancing rapidly, and the literature in computing education is expanding almost as quickly. Initial responses to GenAI tools were mixed between panic and utopian optimism. Many were fast to point out the opportunities and challenges of GenAI. Researchers reported that these new tools are capable of solving most introductory programming tasks and are causing disruptions throughout the curriculum. These tools can write and explain code, enhance error messages, create resources for instructors, and even provide feedback and help for students like a traditional teaching assistant. In 2024, new research started to emerge on the effects of GenAI usage in the computing classroom. These new data involve the use of GenAI to support classroom instruction at scale and to teach students how to code with GenAI. In support of the former, a new class of tools is emerging that can provide personalized feedback to students on their programming assignments or teach both programming and prompting skills at the same time. With the literature expanding so rapidly, this report aims to summarize and explain what is happening on the ground in computing classrooms. We provide a systematic literature review; a survey of educators and industry professionals; and interviews with educators using GenAI in their courses, educators studying GenAI, and researchers who create GenAI tools to support computing education. The triangulation of these methods and data sources expands the understanding of GenAI usage and perceptions at this critical moment for our community.


You're (Not) My Type -- Can LLMs Generate Feedback of Specific Types for Introductory Programming Tasks?

arXiv.org Artificial Intelligence

Background: Feedback as one of the most influential factors for learning has been subject to a great body of research. It plays a key role in the development of educational technology systems and is traditionally rooted in deterministic feedback defined by experts and their experience. However, with the rise of generative AI and especially Large Language Models (LLMs), we expect feedback as part of learning systems to transform, especially for the context of programming. In the past, it was challenging to automate feedback for learners of programming. LLMs may create new possibilities to provide richer, and more individual feedback than ever before. Objectives: This paper aims to generate specific types of feedback for introductory programming tasks using LLMs. We revisit existing feedback taxonomies to capture the specifics of the generated feedback, such as randomness, uncertainty, and degrees of variation. Methods: We iteratively designed prompts for the generation of specific feedback types (as part of existing feedback taxonomies) in response to authentic student programs. We then evaluated the generated output and determined to what extent it reflected certain feedback types. Results and Conclusion: The present work provides a better understanding of different feedback dimensions and characteristics. The results have implications for future feedback research with regard to, for example, feedback effects and learners' informational needs. It further provides a basis for the development of new tools and learning systems for novice programmers including feedback generated by AI.


Feedback-Generation for Programming Exercises With GPT-4

arXiv.org Artificial Intelligence

Ever since Large Language Models (LLMs) and related applications have become broadly available, several studies investigated their potential for assisting educators and supporting students in higher education. LLMs such as Codex, GPT-3.5, and GPT 4 have shown promising results in the context of large programming courses, where students can benefit from feedback and hints if provided timely and at scale. This paper explores the quality of GPT-4 Turbo's generated output for prompts containing both the programming task specification and a student's submission as input. Two assignments from an introductory programming course were selected, and GPT-4 was asked to generate feedback for 55 randomly chosen, authentic student programming submissions. The output was qualitatively analyzed regarding correctness, personalization, fault localization, and other features identified in the material. Compared to prior work and analyses of GPT-3.5, GPT-4 Turbo shows notable improvements. For example, the output is more structured and consistent. GPT-4 Turbo can also accurately identify invalid casing in student programs' output. In some cases, the feedback also includes the output of the student program. At the same time, inconsistent feedback was noted such as stating that the submission is correct but an error needs to be fixed. The present work increases our understanding of LLMs' potential, limitations, and how to integrate them into e-assessment systems, pedagogical scenarios, and instructing students who are using applications based on GPT-4.


Analyzing Chat Protocols of Novice Programmers Solving Introductory Programming Tasks with ChatGPT

arXiv.org Artificial Intelligence

The increasing need for competent computing graduates proficient in programming, software development, and related technical competencies [Ca17] is one of the factors exacerbating pressure on higher education institutions to offer high quality, competency-based education [Ra21]. However, the latter requires extensive resources, mentoring, and, for example, formative feedback for learners, especially in introductory programming classes [Je22; Lo24]. This is due to the fact that novices experience a number of challenges in the process, which have been subject to extensive research in the past decades [Du86; Lu18; SS86]. Among them are cognitively demanding competencies [Ki20; Ki24], such as problem understanding, designing and writing algorithms, debugging, and understanding error messages [Du86; ER16; Ki20; Lu18; SS86]). Educators' expectations towards novice learners and what they can achieve in their first semester(s) seem to be too high and unrealistic [Lu16; Lu18; WCL07]. Moreover, the student-educator ratio in introductory programming classes keeps increasing in German higher education institutions, thereby limiting resources to provide feedback and hints, and adequately address heterogeneous prior knowledge and diverse educational biographies [Pe16; SB22].


The Robots are Here: Navigating the Generative AI Revolution in Computing Education

arXiv.org Artificial Intelligence

Recent advancements in artificial intelligence (AI) are fundamentally reshaping computing, with large language models (LLMs) now effectively being able to generate and interpret source code and natural language instructions. These emergent capabilities have sparked urgent questions in the computing education community around how educators should adapt their pedagogy to address the challenges and to leverage the opportunities presented by this new technology. In this working group report, we undertake a comprehensive exploration of LLMs in the context of computing education and make five significant contributions. First, we provide a detailed review of the literature on LLMs in computing education and synthesise findings from 71 primary articles. Second, we report the findings of a survey of computing students and instructors from across 20 countries, capturing prevailing attitudes towards LLMs and their use in computing education contexts. Third, to understand how pedagogy is already changing, we offer insights collected from in-depth interviews with 22 computing educators from five continents who have already adapted their curricula and assessments. Fourth, we use the ACM Code of Ethics to frame a discussion of ethical issues raised by the use of large language models in computing education, and we provide concrete advice for policy makers, educators, and students. Finally, we benchmark the performance of LLMs on various computing education datasets, and highlight the extent to which the capabilities of current models are rapidly improving. Our aim is that this report will serve as a focal point for both researchers and practitioners who are exploring, adapting, using, and evaluating LLMs and LLM-based tools in computing classrooms.


Exploring the Potential of Large Language Models to Generate Formative Programming Feedback

arXiv.org Artificial Intelligence

Ever since the emergence of large language models (LLMs) and related applications, such as ChatGPT, its performance and error analysis for programming tasks have been subject to research. In this work-in-progress paper, we explore the potential of such LLMs for computing educators and learners, as we analyze the feedback it generates to a given input containing program code. In particular, we aim at (1) exploring how an LLM like ChatGPT responds to students seeking help with their introductory programming tasks, and (2) identifying feedback types in its responses. To achieve these goals, we used students' programming sequences from a dataset gathered within a CS1 course as input for ChatGPT along with questions required to elicit feedback and correct solutions. The results show that ChatGPT performs reasonably well for some of the introductory programming tasks and student errors, which means that students can potentially benefit. However, educators should provide guidance on how to use the provided feedback, as it can contain misleading information for novices.


Large Language Models in Introductory Programming Education: ChatGPT's Performance and Implications for Assessments

arXiv.org Artificial Intelligence

The advent of Large Language Models (LLMs), such as OpenAI's ChatGPT, Codex, and GitHub's Copilot, affects the educational landscape at its core, as LLMs offer entirely new possibilities, but also challenges for educators, learners, and institutions. Even though LLMs have only appeared very recently to a broader audience, research has started to address their implications on computing education, particularly programming. The generative potential may be used by educators for the design of new programming tasks [Sa22], or for students to gather formative feedback [Ka23, Zh22]. At the same time, implications for programming pedagogy and assessments are being discussed [Be23, BK23, RTT23], as the lowthreshold availability of LLMs raises new questions with regard to adequate task designs, students' contribution, plagiarism, and ethical conduct. Educators and institutions will soon need to reconsider the design of (formative) assessments. In this context, it is crucial to investigate the capabilities and limitations of LLMs for novice learners of programming, whose challenges have a well-documented history [SS86, Mc01, Lu18].