Plotting

 Stamper, John


Automated Generation and Tagging of Knowledge Components from Multiple-Choice Questions

arXiv.org Artificial Intelligence

Knowledge Components (KCs) linked to assessments enhance the measurement of student learning, enrich analytics, and facilitate adaptivity. However, generating and linking KCs to assessment items requires significant effort and domain-specific knowledge. To streamline this process for higher-education courses, we employed GPT-4 to generate KCs for multiple-choice questions (MCQs) in Chemistry and E-Learning. We analyzed discrepancies between the KCs generated by the Large Language Model (LLM) and those made by humans through evaluation from three domain experts in each subject area. This evaluation aimed to determine whether, in instances of non-matching KCs, evaluators showed a preference for the LLM-generated KCs over their human-created counterparts. We also developed an ontology induction algorithm to cluster questions that assess similar KCs based on their content. Our most effective LLM strategy accurately matched KCs for 56% of Chemistry and 35% of E-Learning MCQs, with even higher success when considering the top five KC suggestions. Human evaluators favored LLM-generated KCs, choosing them over human-assigned ones approximately two-thirds of the time, a preference that was statistically significant across both domains. Our clustering algorithm successfully grouped questions by their underlying KCs without needing explicit labels or contextual information. This research advances the automation of KC generation and classification for assessment items, alleviating the need for student data or predefined KC labels.


An Automatic Question Usability Evaluation Toolkit

arXiv.org Artificial Intelligence

Evaluating multiple-choice questions (MCQs) involves either labor intensive human assessments or automated methods that prioritize readability, often overlooking deeper question design flaws. To address this issue, we introduce the Scalable Automatic Question Usability Evaluation Toolkit (SAQUET), an open-source tool that leverages the Item-Writing Flaws (IWF) rubric for a comprehensive and automated quality evaluation of MCQs. By harnessing the latest in large language models such as GPT-4, advanced word embeddings, and Transformers designed to analyze textual complexity, SAQUET effectively pinpoints and assesses a wide array of flaws in MCQs. We first demonstrate the discrepancy between commonly used automated evaluation metrics and the human assessment of MCQ quality. Then we evaluate SAQUET on a diverse dataset of MCQs across the five domains of Chemistry, Statistics, Computer Science, Humanities, and Healthcare, showing how it effectively distinguishes between flawed and flawless questions, providing a level of analysis beyond what is achievable with traditional metrics. With an accuracy rate of over 94% in detecting the presence of flaws identified by human evaluators, our findings emphasize the limitations of existing evaluation methods and showcase potential in improving the quality of educational assessments.


Exploring How Multiple Levels of GPT-Generated Programming Hints Support or Disappoint Novices

arXiv.org Artificial Intelligence

Recent studies have integrated large language models (LLMs) into diverse educational contexts, including providing adaptive programming hints, a type of feedback focuses on helping students move forward during problem-solving. However, most existing LLM-based hint systems are limited to one single hint type. To investigate whether and how different levels of hints can support students' problem-solving and learning, we conducted a think-aloud study with 12 novices using the LLM Hint Factory, a system providing four levels of hints from general natural language guidance to concrete code assistance, varying in format and granularity. We discovered that high-level natural language hints alone can be helpless or even misleading, especially when addressing next-step or syntax-related help requests. Adding lower-level hints, like code examples with in-line comments, can better support students. The findings open up future work on customizing help responses from content, format, and granularity levels to accurately identify and meet students' learning needs.


Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods

arXiv.org Artificial Intelligence

Multiple-choice questions with item-writing flaws can negatively impact student learning and skew analytics. These flaws are often present in student-generated questions, making it difficult to assess their quality and suitability for classroom usage. Existing methods for evaluating multiple-choice questions often focus on machine readability metrics, without considering their intended use within course materials and their pedagogical implications. In this study, we compared the performance of a rule-based method we developed to a machine-learning based method utilizing GPT-4 for the task of automatically assessing multiple-choice questions based on 19 common item-writing flaws. By analyzing 200 student-generated questions from four different subject areas, we found that the rule-based method correctly detected 91% of the flaws identified by human annotators, as compared to 79% by GPT-4. We demonstrated the effectiveness of the two methods in identifying common item-writing flaws present in the student-generated questions across different subject areas. The rule-based method can accurately and efficiently evaluate multiple-choice questions from multiple domains, outperforming GPT-4 and going beyond existing metrics that do not account for the educational use of such questions. Finally, we discuss the potential for using these automated methods to improve the quality of questions based on the identified flaws.


Learnersourcing in the Age of AI: Student, Educator and Machine Partnerships for Content Creation

arXiv.org Artificial Intelligence

Our increasingly connected world is empowering learners and enabling exciting new pedagogies. In particular, educational tools that facilitate collaboration between students can help to foster a wide range of social and domainspecific skills (Jeong, Hmelo-Silver and Jo, 2019). The literature on computer supported collaborative learning documents a diverse range of pedagogies that have been applied for decades in many subject domains and educational levels (Lehtinen, Hakkarainen, Lipponen, Rahikainen and Muukkonen, 1999; Roberts, 2005; Kaliisa, Rienties, Mørch and Kluge, 2022). One recent approach, derived from foundational work on contributing student pedagogies (Collis and Moonen, 2002; Hamer, Sheard, Purchase and Luxton-Reilly, 2012), involves students creating and sharing learning resources with one another. Such activities have gained popularity in recent years and are associated with two broad types of benefits. Firstly, creating learning content is a cognitively demanding task that requires students to engage deeply with course concepts and exhibit behaviours at the highest level of Bloom's taxonomy of educational objectives (Hilton, Goldwater, Hancock, Clemson, Huang and Denyer, 2022). Secondly, leveraging the creative power of many students can result in the rapid and cost-effective creation of large repositories of learning resources that can, in turn, be used for practice and to support personalized learning experiences (Singh, Brooks, Lin and Li, 2021). Learnersourcing is a commonly used term to describe the practice of having students work collaboratively to generate shared learning resources (Kim, 2015). It is related to the more general task of crowdsourcing, in which tasks are outsourced to a pool of participants, often drawn from large and undefined populations, each of whom makes a small contribution to some product.


New Potentials for Data-Driven Intelligent Tutoring System Development and Optimization

AI Magazine

Increasing widespread use of educational technologies is producing vast amounts of data. Such data can be used to help advance our understanding of student learning and enable more intelligent, interactive, engaging, and effective education. In this article, we discuss the status and prospects of this new and powerful opportunity for data-driven development and optimization of educational technologies, focusing on intelligent tutoring systems We provide examples of use of a variety of techniques to develop or optimize the select, evaluate, suggest, and update functions of intelligent tutors, including probabilistic grammar learning, rule induction, Markov decision process, classification, and integrations of symbolic search and statistical inference.


New Potentials for Data-Driven Intelligent Tutoring System Development and Optimization

AI Magazine

Increasing widespread use of educational technologies is producing vast amounts of data. Such data can be used to help advance our understanding of student learning and enable more intelligent, interactive, engaging, and effective education. In this article, we discuss the status and prospects of this new and powerful opportunity for data-driven development and optimization of educational technologies, focusing on intelligent tutoring systems We provide examples of use of a variety of techniques to develop or optimize the select, evaluate, suggest, and update functions of intelligent tutors, including probabilistic grammar learning, rule induction, Markov decision process, classification, and integrations of symbolic search and statistical inference.