Not enough data to create a plot.
Try a different view from the menu above.
Borchers, Conrad
Predicting Learning Performance with Large Language Models: A Study in Adult Literacy
Zhang, Liang, Lin, Jionghao, Borchers, Conrad, Sabatini, John, Hollander, John, Cao, Meng, Hu, Xiangen
Intelligent Tutoring Systems (ITSs) have significantly enhanced adult literacy training, a key factor for societal participation, employment opportunities, and lifelong learning. Our study investigates the application of advanced AI models, including Large Language Models (LLMs) like GPT-4, for predicting learning performance in adult literacy programs in ITSs. This research is motivated by the potential of LLMs to predict learning performance based on its inherent reasoning and computational capabilities. By using reading comprehension datasets from the ITS, AutoTutor, we evaluate the predictive capabilities of GPT-4 versus traditional machine learning methods in predicting learning performance through five-fold cross-validation techniques. Our findings show that the GPT-4 presents the competitive predictive abilities with traditional machine learning methods such as Bayesian Knowledge Tracing, Performance Factor Analysis, Sparse Factor Analysis Lite (SPARFA-Lite), tensor factorization and eXtreme Gradient Boosting (XGBoost). While XGBoost (trained on local machine) outperforms GPT-4 in predictive accuracy, GPT-4-selected XGBoost and its subsequent tuning on the GPT-4 platform demonstrates superior performance compared to local machine execution. Moreover, our investigation into hyper-parameter tuning by GPT-4 versus grid-search suggests comparable performance, albeit with less stability in the automated approach, using XGBoost as the case study. Our study contributes to the field by highlighting the potential of integrating LLMs with traditional machine learning models to enhance predictive accuracy and personalize adult literacy education, setting a foundation for future research in applying LLMs within ITSs.
Improving Assessment of Tutoring Practices using Retrieval-Augmented Generation
Han, Zifei FeiFei, Lin, Jionghao, Gurung, Ashish, Thomas, Danielle R., Chen, Eason, Borchers, Conrad, Gupta, Shivang, Koedinger, Kenneth R.
One-on-one tutoring is an effective instructional method for enhancing learning, yet its efficacy hinges on tutor competencies. Novice math tutors often prioritize content-specific guidance, neglecting aspects such as social-emotional learning. Social-emotional learning promotes equity and inclusion and nurturing relationships with students, which is crucial for holistic student development. Assessing the competencies of tutors accurately and efficiently can drive the development of tailored tutor training programs. However, evaluating novice tutor ability during real-time tutoring remains challenging as it typically requires experts-in-the-loop. To address this challenge, this preliminary study aims to harness Generative Pre-trained Transformers (GPT), such as GPT-3.5 and GPT-4 models, to automatically assess tutors' ability of using social-emotional tutoring strategies. Moreover, this study also reports on the financial dimensions and considerations of employing these models in real-time and at scale for automated assessment. The current study examined four prompting strategies: two basic Zero-shot prompt strategies, Tree of Thought prompt, and Retrieval-Augmented Generator (RAG) based prompt. The results indicate that the RAG prompt demonstrated more accurate performance (assessed by the level of hallucination and correctness in the generated assessment texts) and lower financial costs than the other strategies evaluated. These findings inform the development of personalized tutor training interventions to enhance the the educational effectiveness of tutored learning.
3DG: A Framework for Using Generative AI for Handling Sparse Learner Performance Data From Intelligent Tutoring Systems
Zhang, Liang, Lin, Jionghao, Borchers, Conrad, Cao, Meng, Hu, Xiangen
Learning performance data (e.g., quiz scores and attempts) is significant for understanding learner engagement and knowledge mastery level. However, the learning performance data collected from Intelligent Tutoring Systems (ITSs) often suffers from sparsity, impacting the accuracy of learner modeling and knowledge assessments. To address this, we introduce the 3DG framework (3-Dimensional tensor for Densification and Generation), a novel approach combining tensor factorization with advanced generative models, including Generative Adversarial Network (GAN) and Generative Pre-trained Transformer (GPT), for enhanced data imputation and augmentation. The framework operates by first representing the data as a three-dimensional tensor, capturing dimensions of learners, questions, and attempts. It then densifies the data through tensor factorization and augments it using Generative AI models, tailored to individual learning patterns identified via clustering. Applied to data from an AutoTutor lesson by the Center for the Study of Adult Literacy (CSAL), the 3DG framework effectively generated scalable, personalized simulations of learning performance. Comparative analysis revealed GAN's superior reliability over GPT-4 in this context, underscoring its potential in addressing data sparsity challenges in ITSs and contributing to the advancement of personalized educational technology.
Revealing Networks: Understanding Effective Teacher Practices in AI-Supported Classrooms using Transmodal Ordered Network Analysis
Borchers, Conrad, Wang, Yeyu, Karumbaiah, Shamya, Ashiq, Muhammad, Shaffer, David Williamson, Aleven, Vincent
Learning analytics research increasingly studies classroom learning with AI-based systems through rich contextual data from outside these systems, especially student-teacher interactions. One key challenge in leveraging such data is generating meaningful insights into effective teacher practices. Quantitative ethnography bears the potential to close this gap by combining multimodal data streams into networks of co-occurring behavior that drive insight into favorable learning conditions. The present study uses transmodal ordered network analysis to understand effective teacher practices in relationship to traditional metrics of in-system learning in a mathematics classroom working with AI tutors. Incorporating teacher practices captured by position tracking and human observation codes into modeling significantly improved the inference of how efficiently students improved in the AI tutor beyond a model with tutor log data features only. Comparing teacher practices by student learning rates, we find that students with low learning rates exhibited more hint use after monitoring. However, after an extended visit, students with low learning rates showed learning behavior similar to their high learning rate peers, achieving repeated correct attempts in the tutor. Observation notes suggest conceptual and procedural support differences can help explain visit effectiveness. Taken together, offering early conceptual support to students with low learning rates could make classroom practice with AI tutors more effective. This study advances the scientific understanding of effective teacher practice in classrooms learning with AI tutors and methodologies to make such practices visible.
Using Think-Aloud Data to Understand Relations between Self-Regulation Cycle Characteristics and Student Performance in Intelligent Tutoring Systems
Borchers, Conrad, Zhang, Jiayi, Baker, Ryan S., Aleven, Vincent
Numerous studies demonstrate the importance of self-regulation during learning by problem-solving. Recent work in learning analytics has largely examined students' use of SRL concerning overall learning gains. Limited research has related SRL to in-the-moment performance differences among learners. The present study investigates SRL behaviors in relationship to learners' moment-by-moment performance while working with intelligent tutoring systems for stoichiometry chemistry. We demonstrate the feasibility of labeling SRL behaviors based on AI-generated think-aloud transcripts, identifying the presence or absence of four SRL categories (processing information, planning, enacting, and realizing errors) in each utterance. Using the SRL codes, we conducted regression analyses to examine how the use of SRL in terms of presence, frequency, cyclical characteristics, and recency relate to student performance on subsequent steps in multi-step problems. A model considering students' SRL cycle characteristics outperformed a model only using in-the-moment SRL assessment. In line with theoretical predictions, students' actions during earlier, process-heavy stages of SRL cycles exhibited lower moment-by-moment correctness during problem-solving than later SRL cycle stages. We discuss system re-design opportunities to add SRL support during stages of processing and paths forward for using machine learning to speed research depending on the assessment of SRL based on transcription of think-aloud data.
Insights into undergraduate pathways using course load analytics
Borchers, Conrad, Pardos, Zachary A.
Compared to K-12, US institutions of higher education, particularly four-year universities, give students a high amount of elective course choice. This choice comes with unique challenges that can inhibit their learning path, such as the choice to overload on credit hours causing early undergraduate dropout among older students with prior vocational training and completed degrees [22]. Conversely, low enrollment levels have also been found to be associated with worse educational outcomes, potentially due to a lack of financial and academic support [5]. These findings, though seemingly contradictory, suggest that semester workload may play an important role in explaining the complicated story of student success in higher education. However, recent work has found that credit hours is not a suitable proxy for course workload, as it captures only 6% of the variance in student reported course load compared to 36% captured by LMS features [36]. In this paper, we introduce course load analytics (CLA) as a machine learning approach to producing metrics about course workload relevant to student course selection. This work is the first to predict course load at scale, generalizing to over 10,000 courses at a large public institution and going beyond time load considerations by incorporating more holistic measures such as mental effort and psychological stress. Our findings suggest that the discrepancy between anticipated course load (i.e., as calculated by credit hours) and actual course load (i.e., as estimated by CLA) may be a significant factor in program stop-out.