posttest
Enhancing nonnative speech perception and production through an AI-powered application
While research on using Artificial Intelligence (AI) through various applications to enhance foreign language pronunciation is expanding, it has primarily focused on aspects such as comprehensibility and intelligibility, largely neglecting the improvement of individual speech sounds in both perception and production. This study seeks to address this gap by examining the impact of training with an AI-powered mobile application on nonnative sound perception and production. Participants completed a pretest assessing their ability to discriminate the second language English heed-hid contrast and produce these vowels in sentence contexts. The intervention involved training with the Speakometer mobile application, which incorporated recording tasks featuring the English vowels, along with pronunciation feedback and practice. The posttest mirrored the pretest to measure changes in performance. The results revealed significant improvements in both discrimination accuracy and production of the target contrast following the intervention. However, participants did not achieve native-like competence. These findings highlight the effectiveness of AI-powered applications in facilitating speech acquisition and support their potential use for personalized, interactive pronunciation training beyond the classroom.
Do Tutors Learn from Equity Training and Can Generative AI Assess It?
Thomas, Danielle R., Borchers, Conrad, Kakarla, Sanjit, Lin, Jionghao, Bhushan, Shambhavi, Guo, Boyuan, Gatz, Erin, Koedinger, Kenneth R.
Equity is a core concern of learning analytics. However, applications that teach and assess equity skills, particularly at scale are lacking, often due to barriers in evaluating language. Advances in generative AI via large language models (LLMs) are being used in a wide range of applications, with this present work assessing its use in the equity domain. We evaluate tutor performance within an online lesson on enhancing tutors' skills when responding to students in potentially inequitable situations. We apply a mixed-method approach to analyze the performance of 81 undergraduate remote tutors. We find marginally significant learning gains with increases in tutors' self-reported confidence in their knowledge in responding to middle school students experiencing possible inequities from pretest to posttest. Both GPT-4o and GPT-4-turbo demonstrate proficiency in assessing tutors ability to predict and explain the best approach. Balancing performance, efficiency, and cost, we determine that few-shot learning using GPT-4o is the preferred model. This work makes available a dataset of lesson log data, tutor responses, rubrics for human annotation, and generative AI prompts. Future work involves leveling the difficulty among scenarios and enhancing LLM prompts for large-scale grading and assessment.
Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCT
Thomas, Danielle R., Borchers, Conrad, Kakarla, Sanjit, Lin, Jionghao, Bhushan, Shambhavi, Guo, Boyuan, Gatz, Erin, Koedinger, Kenneth R.
The role of multiple-choice questions (MCQs) as effective learning tools has been debated in past research. While MCQs are widely used due to their ease in grading, open response questions are increasingly used for instruction, given advances in large language models (LLMs) for automated grading. This study evaluates MCQs effectiveness relative to open-response questions, both individually and in combination, on learning. These activities are embedded within six tutor lessons on advocacy. Using a posttest-only randomized control design, we compare the performance of 234 tutors (790 lesson completions) across three conditions: MCQ only, open response only, and a combination of both. We find no significant learning differences across conditions at posttest, but tutors in the MCQ condition took significantly less time to complete instruction. These findings suggest that MCQs are as effective, and more efficient, than open response tasks for learning when practice time is limited. To further enhance efficiency, we autograded open responses using GPT-4o and GPT-4-turbo. GPT models demonstrate proficiency for purposes of low-stakes assessment, though further research is needed for broader use. This study contributes a dataset of lesson log data, human annotation rubrics, and LLM prompts to promote transparency and reproducibility.
Integrating Attentional Factors and Spacing in Logistic Knowledge Tracing Models to Explore the Impact of Training Sequences on Category Learning
Cao, Meng, Pavlik, Philip I. Jr., Chu, Wei, Zhang, Liang
In category learning, a growing body of literature has increasingly focused on exploring the impacts of interleaving in contrast to blocking. The sequential attention hypothesis posits that interleaving draws attention to the differences between categories while blocking directs attention toward similarities within categories. Although a recent study underscores the joint influence of memory and attentional factors on sequencing effects, there remains a scarcity of effective computational models integrating both attentional and memory considerations to comprehensively understand the effect of training sequences on students' performance. This study introduces a novel integration of attentional factors and spacing into the logistic knowledge tracing (LKT) models to monitor students' performance across different training sequences (interleaving and blocking). Attentional factors were incorporated by recording the counts of comparisons between adjacent trials, considering whether they belong to the same or different category. Several features were employed to account for temporal spacing. We used cross-validations to test the model fit and predictions on the learning session and posttest. Our findings reveal that incorporating both attentional factors and spacing features in the Additive Factors Model (AFM) significantly enhances its capacity to capture the effects of interleaving and blocking and demonstrates superior predictive accuracy for students' learning outcomes. By bridging the gap between attentional factors and memory processes, our computational approach offers a more comprehensive framework for understanding and predicting category learning outcomes in educational settings.
Choose Your Own Adventure: Interactive E-Books to Improve Word Knowledge and Comprehension Skills
Day, Stephanie, Hwang, Jin K., Arner, Tracy, McNamara, Danielle, Connor, Carol
The purpose of this feasibility study was to examine the potential impact of reading digital interactive e-books on essential skills that support reading comprehension with third-fifth grade students. Students read two e-Books that taught word learning and comprehension monitoring strategies in the service of learning difficult vocabulary and targeted science concepts about hurricanes. We investigated whether specific comprehension strategies including word learning and strategies that supported general reading comprehension, summarization, and question generation, show promise of effectiveness in building vocabulary knowledge and comprehension skills in the e-Books. Students were assigned to read one of three versions of each of the e-Books, each version implemented one strategy. The books employed a choose-your-adventure format with embedded comprehension questions that provided students with immediate feedback on their responses. Paired samples t-tests were run to examine pre-to-post differences in learning the targeted vocabulary and science concepts taught in both e-Books. For both e-Books, students demonstrated significant gains in word learning and on the targeted hurricane concepts. Additionally, Hierarchical Linear Modeling (HLM) revealed that no one strategy was more associated with larger gains than the other. Performance on the embedded questions in the books was also associated with greater posttest outcomes for both e-Books. This work discusses important considerations for implementation and future development of e-books that can enhance student engagement and improve reading comprehension.
Extending Deep Knowledge Tracing: Inferring Interpretable Knowledge and Predicting Post-System Performance
Scruggs, Richard, Baker, Ryan S., McLaren, Bruce M.
Recent student knowledge modeling algorithms such as DKT and DKVMN have been shown to produce accurate predictions of problem correctness within the same learning system . However, these algorithms do not generate estimates of student knowledge. In this paper we present an extension that infers knowledge estimates from correctness predictions. We apply this extension to DKT and DKVMN, result ing in knowledge estimates that correlate better with a posttest than knowledge estimates produced by PFA or BKT. We also apply our extension to correctness predictions from PFA and BKT, finding that knowledge predictions produced with it correlate better with the posttest than BKT and PFA's own knowledge predictions. These findings are significant since the primary aim of education is to prepare students for later experiences outside of the immediate learning activity.
Interactive Concept Maps and Learning Outcomes in Guru
Person, Natalie K. (Rhodes College) | Olney, Andrew M. (University of Memphis) | D' (University of Notre Dame) | Mello, Sidney K. (University of Memphis) | Lehman, Blair A.
Concept maps are frequently used in K-12 educational settings. The purpose of this study is to determine whether students’ performance on interactive concept map tasks in Guru, an intelligent tutoring system, is related to immediate and delayed learning outcomes. Guru is a dialogue-based system for high-school biology that intersperses concept map tasks within the tutorial dialogue. Results indicated that when students first attempt to complete concept maps, time spent on the maps may be a good indicator of their understanding, whereas the errors they make on their second attempts with the maps may be an indicator of the knowledge they are lacking. This pattern of results was observed for one cycle of testing, but not replicated in a second cycle. Differences in the findings for the two testing cycles are most likely due to topic variations.
A Comparison of Gains between Educational Games and a Traditional ITS
Jackson, G. Tanner Tanner (Arizona State University) | Boonthum-Denecke, Chutima (Hampton University) | McNamara, Danielle S. (Arizona State University)
Intelligent Tutoring Systems (ITSs) have begun to incorporate game-based components in an attempt to balance the learning benefits of ITSs with the motivational benefits of games. iSTART-ME (Motivationally Enhanced) is a new game-based learning environment that was developed on top of an existing ITS (iSTART). In a multi-session lab-based efficacy study with 125 high school students, those students with a low prior reading ability who were trained by a game-based tutoring system (iSTART-ME) or a traditional intelligent tutoring system (iSTART-Regular) performed significantly better on posttest measures than students assigned to a time-delayed control condition. Additionally, the low reading ability students who interacted with the game-based system had a tendency to gain more than students in the traditional ITS system.
Curiosity and the Development of Question Generation Skills
Jirout, Jamie J. (Carnegie Mellon University)
The current study investigates the relationship between children’s curiosity and question asking ability. Generation of two types of questions was assessed: identification (yes/no questions asked to identify a target from an array) and understanding questions, asked to learn more about a topic. The latter was related to children’s curiosity, as was the ability to recognize the effectiveness of questions in solving a mystery. Training on asking identification questions was effective in improving children’s ability to ask that type of question, but did not transfer to the other task. Training on asking understanding questions was not successful. Children’s curiosity did not influence the effectiveness of the training.