Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing

Ozyurt, Yilmazcan, Feuerriegel, Stefan, Sachan, Mrinmaya

arXiv.org Artificial Intelligence 

Knowledge tracing (KT) is a popular approach for modeling students' learning progress over time, which can enable more personalized and adaptive learning. However, existing KT approaches face two major limitations: (1) they rely heavily on expert-defined knowledge concepts (KCs) in questions, which is timeconsuming and prone to errors; and (2) KT methods tend to overlook the semantics of both questions and the given KCs. In this work, we address these challenges and present KCQRL, a framework for automated knowledge concept annotation and question representation learning that can improve the effectiveness of any existing KT model. First, we propose an automated KC annotation process using large language models (LLMs), which generates question solutions and then annotates KCs in each solution step of the questions. Second, we introduce a contrastive learning approach to generate semantically rich embeddings for questions and solution steps, aligning them with their associated KCs via a tailored false negative elimination approach. These embeddings can be readily integrated into existing KT models, replacing their randomly initialized embeddings. We demonstrate the effectiveness of KCQRL across 15 KT models on two large real-world Math learning datasets, where we achieve consistent performance improvements. The recent years have witnessed a surge in online learning platforms (Adedoyin & Soykan, 2023; Gros & García-Peñalvo, 2023), where students learn new knowledge concepts, which are then tested through exercises. Needless to say, personalization is crucial for effective learning: it allows that new knowledge concepts are carefully tailored to the current knowledge state of the student, which is more effective than one-size-fits-all approaches to learning (Cui & Sachan, 2023; Xu et al., 2024). However, such personalization requires that the knowledge of students is continuously assessed, which highlights the need for knowledge tracing (KT). In KT, one models the temporal dynamics of students' learning processes (Corbett & Anderson, 1994) in terms of a core set of skills, which are called knowledge concepts (KCs). KT models are typically time-series models that receive the past interactions of the learner as input (e.g., her previous exercises) in order to predict response of the learner to the next exercise. Yet, existing KT models have two main limitations that hinder their applicability in practice (see Figure 1). They require a comprehensive mapping between KCs and questions, which is typically done through manual annotations by experts. However, such KC annotation is both time-intensive and prone to errors (Clark, 2014; Bier et al., 2019). KT models overlook the semantics of both questions and KCs.