Instructional Material
Gaining Insights into Group-Level Course Difficulty via Differential Course Functioning
Baucks, Frederik, Schmucker, Robin, Borchers, Conrad, Pardos, Zachary A., Wiskott, Laurenz
Curriculum Analytics (CA) studies curriculum structure and student data to ensure the quality of educational programs. One desirable property of courses within curricula is that they are not unexpectedly more difficult for students of different backgrounds. While prior work points to likely variations in course difficulty across student groups, robust methodologies for capturing such variations are scarce, and existing approaches do not adequately decouple course-specific difficulty from students' general performance levels. The present study introduces Differential Course Functioning (DCF) as an Item Response Theory (IRT)-based CA methodology. DCF controls for student performance levels and examines whether significant differences exist in how distinct student groups succeed in a given course. Leveraging data from over 20,000 students at a large public university, we demonstrate DCF's ability to detect inequities in undergraduate course difficulty across student groups described by grade achievement. We compare major pairs with high co-enrollment and transfer students to their non-transfer peers. For the former, our findings suggest a link between DCF effect sizes and the alignment of course content to student home department motivating interventions targeted towards improving course preparedness. For the latter, results suggest minor variations in course-specific difficulty between transfer and non-transfer students. While this is desirable, it also suggests that interventions targeted toward mitigating grade achievement gaps in transfer students should encompass comprehensive support beyond enhancing preparedness for individual courses. By providing more nuanced and equitable assessments of academic performance and difficulties experienced by diverse student populations, DCF could support policymakers, course articulation officers, and student advisors.
A General Model for Detecting Learner Engagement: Implementation and Evaluation
Malekshahi, Somayeh, Kheyridoost, Javad M., Fatemi, Omid
Considering learner engagement has a mutual benefit for both learners and instructors. Instructors can help learners increase their attention, involvement, motivation, and interest. On the other hand, instructors can improve their instructional performance by evaluating the cumulative results of all learners and upgrading their training programs. This paper proposes a general, lightweight model for selecting and processing features to detect learners' engagement levels while preserving the sequential temporal relationship over time. During training and testing, we analyzed the videos from the publicly available DAiSEE dataset to capture the dynamic essence of learner engagement. We have also proposed an adaptation policy to find new labels that utilize the affective states of this dataset related to education, thereby improving the models' judgment. The suggested model achieves an accuracy of 68.57\% in a specific implementation and outperforms the studied state-of-the-art models detecting learners' engagement levels.
MEDVOC: Vocabulary Adaptation for Fine-tuning Pre-trained Language Models on Medical Text Summarization
Balde, Gunjan, Roy, Soumyadeep, Mondal, Mainack, Ganguly, Niloy
This work presents a dynamic vocabulary adaptation strategy, MEDVOC, for fine-tuning pre-trained language models (PLMs) like BertSumAbs, BART, and PEGASUS for improved medical text summarization. In contrast to existing domain adaptation approaches in summarization, MEDVOC treats vocabulary as an optimizable parameter and optimizes the PLM vocabulary based on fragment score conditioned only on the downstream task's reference summaries. Unlike previous works on vocabulary adaptation (limited only to classification tasks), optimizing vocabulary based on summarization tasks requires an extremely costly intermediate fine-tuning step on large summarization datasets. To that end, our novel fragment score-based hyperparameter search very significantly reduces this fine-tuning time -- from 450 days to less than 2 days on average. Furthermore, while previous works on vocabulary adaptation are often primarily tied to single PLMs, MEDVOC is designed to be deployable across multiple PLMs (with varying model vocabulary sizes, pre-training objectives, and model sizes) -- bridging the limited vocabulary overlap between the biomedical literature domain and PLMs. MEDVOC outperforms baselines by 15.74% in terms of Rouge-L in zero-shot setting and shows gains of 17.29% in high Out-Of-Vocabulary (OOV) concentrations. Our human evaluation shows MEDVOC generates more faithful medical summaries (88% compared to 59% in baselines). We make the codebase publicly available at https://github.com/gb-kgp/MEDVOC.
Responding to Generative AI Technologies with Research-through-Design: The Ryelands AI Lab as an Exploratory Study
Benjamin, Jesse Josua, Lindley, Joseph, Edwards, Elizabeth, Rubegni, Elisa, Korjakow, Tim, Grist, David, Sharkey, Rhiannon
Generative AI technologies demand new practical and critical competencies, which call on design to respond to and foster these. We present an exploratory study guided by Research-through-Design, in which we partnered with a primary school to develop a constructionist curriculum centered on students interacting with a generative AI technology. We provide a detailed account of the design of and outputs from the curriculum and learning materials, finding centrally that the reflexive and prolonged `hands-on' approach led to a co-development of students' practical and critical competencies. From the study, we contribute guidance for designing constructionist approaches to generative AI technology education; further arguing to do so with `critical responsivity.' We then discuss how HCI researchers may leverage constructionist strategies in designing interactions with generative AI technologies; and suggest that Research-through-Design can play an important role as a `rapid response methodology' capable of reacting to fast-evolving, disruptive technologies such as generative AI.
Beyond human subjectivity and error: a novel AI grading system
Gobrecht, Alexandra, Tuma, Felix, Möller, Moritz, Zöller, Thomas, Zakhvatkin, Mark, Wuttig, Alexandra, Sommerfeldt, Holger, Schütt, Sven
The grading of open-ended questions is a high-effort, high-impact task in education. Automating this task promises a significant reduction in workload for education professionals, as well as more consistent grading outcomes for students, by circumventing human subjectivity and error. While recent breakthroughs in AI technology might facilitate such automation, this has not been demonstrated at scale. It this paper, we introduce a novel automatic short answer grading (ASAG) system. The system is based on a fine-tuned open-source transformer model which we trained on large set of exam data from university courses across a large range of disciplines. We evaluated the trained model's performance against held-out test data in a first experiment and found high accuracy levels across a broad spectrum of unseen questions, even in unseen courses. We further compared the performance of our model with that of certified human domain experts in a second experiment: we first assembled another test dataset from real historical exams - the historic grades contained in that data were awarded to students in a regulated, legally binding examination process; we therefore considered them as ground truth for our experiment. We then asked certified human domain experts and our model to grade the historic student answers again without disclosing the historic grades. Finally, we compared the hence obtained grades with the historic grades (our ground truth). We found that for the courses examined, the model deviated less from the official historic grades than the human re-graders - the model's median absolute error was 44 % smaller than the human re-graders', implying that the model is more consistent than humans in grading. These results suggest that leveraging AI enhanced grading can reduce human subjectivity, improve consistency and thus ultimately increase fairness.
Interpretable Tensor Fusion
Varshneya, Saurabh, Ledent, Antoine, Liznerski, Philipp, Balinskyy, Andriy, Mehta, Purvanshi, Mustafa, Waleed, Kloft, Marius
Conventional machine learning methods are predominantly designed to predict outcomes based on a single data type. However, practical applications may encompass data of diverse types, such as text, images, and audio. We introduce interpretable tensor fusion (InTense), a multimodal learning method for training neural networks to simultaneously learn multimodal data representations and their interpretable fusion. InTense can separately capture both linear combinations and multiplicative interactions of diverse data types, thereby disentangling higher-order interactions from the individual effects of each modality. InTense provides interpretability out of the box by assigning relevance scores to modalities and their associations. The approach is theoretically grounded and yields meaningful relevance scores on multiple synthetic and real-world datasets. Experiments on six real-world datasets show that InTense outperforms existing state-of-the-art multimodal interpretable approaches in terms of accuracy and interpretability.
Evaluating and Optimizing Educational Content with Large Language Model Judgments
He-Yueya, Joy, Goodman, Noah D., Brunskill, Emma
Creating effective educational materials generally requires expensive and time-consuming studies of student learning outcomes. To overcome this barrier, one idea is to build computational models of student learning and use them to optimize instructional materials. However, it is difficult to model the cognitive processes of learning dynamics. We propose an alternative approach that uses Language Models (LMs) as educational experts to assess the impact of various instructions on learning outcomes. Specifically, we use GPT-3.5 to evaluate the overall effect of instructional materials on different student groups and find that it can replicate well-established educational findings such as the Expertise Reversal Effect and the Variability Effect. This demonstrates the potential of LMs as reliable evaluators of educational content. Building on this insight, we introduce an instruction optimization approach in which one LM generates instructional materials using the judgments of another LM as a reward function. We apply this approach to create math word problem worksheets aimed at maximizing student learning gains. Human teachers' evaluations of these LM-generated worksheets show a significant alignment between the LM judgments and human teacher preferences. We conclude by discussing potential divergences between human and LM opinions and the resulting pitfalls of automating instructional design.
FOKE: A Personalized and Explainable Education Framework Integrating Foundation Models, Knowledge Graphs, and Prompt Engineering
Integrating large language models (LLMs) and knowledge graphs (KGs) holds great promise for revolutionizing intelligent education, but challenges remain in achieving personalization, interactivity, and explainability. We propose FOKE, a Forest Of Knowledge and Education framework that synergizes foundation models, knowledge graphs, and prompt engineering to address these challenges. FOKE introduces three key innovations: (1) a hierarchical knowledge forest for structured domain knowledge representation; (2) a multi-dimensional user profiling mechanism for comprehensive learner modeling; and (3) an interactive prompt engineering scheme for generating precise and tailored learning guidance. We showcase FOKE's application in programming education, homework assessment, and learning path planning, demonstrating its effectiveness and practicality. Additionally, we implement Scholar Hero, a real-world instantiation of FOKE. Our research highlights the potential of integrating foundation models, knowledge graphs, and prompt engineering to revolutionize intelligent education practices, ultimately benefiting learners worldwide. FOKE provides a principled and unified approach to harnessing cutting-edge AI technologies for personalized, interactive, and explainable educational services, paving the way for further research and development in this critical direction.
Model- and Data-Based Control of Self-Balancing Robots: Practical Educational Approach with LabVIEW and Arduino
Abdelgawad, Abdelrahman, Shohdy, Tarek, Nada, Ayman
A two-wheeled self-balancing robot (TWSBR) is non-linear and unstable system. This study compares the performance of model-based and data-based control strategies for TWSBRs, with an explicit practical educational approach. Model-based control (MBC) algorithms such as Lead-Lag and PID control require a proficient dynamic modeling and mathematical manipulation to drive the linearized equations of motions and develop the appropriate controller. On the other side, data-based control (DBC) methods, like fuzzy control, provide a simpler and quicker approach to designing effective controllers without needing in-depth understanding of the system model. In this paper, the advantages and disadvantages of both MBC and DBC using a TWSBR are illustrated. All controllers were implemented and tested on the OSOYOO self-balancing kit, including an Arduino microcontroller, MPU-6050 sensor, and DC motors. The control law and the user interface are constructed using the LabVIEW-LINX toolkit. A real-time hardware-in-loop experiment validates the results, highlighting controllers that can be implemented on a cost-effective platform.
A Lightweight Neural Architecture Search Model for Medical Image Classification
Xie, Lunchen, Lomurno, Eugenio, Gambella, Matteo, Ardagna, Danilo, Roveri, Manuel, Matteucci, Matteo, Shi, Qingjiang
Accurate classification of medical images is essential for modern diagnostics. Deep learning advancements led clinicians to increasingly use sophisticated models to make faster and more accurate decisions, sometimes replacing human judgment. However, model development is costly and repetitive. Neural Architecture Search (NAS) provides solutions by automating the design of deep learning architectures. This paper presents ZO-DARTS+, a differentiable NAS algorithm that improves search efficiency through a novel method of generating sparse probabilities by bi-level optimization. Experiments on five public medical datasets show that ZO-DARTS+ matches the accuracy of state-of-the-art solutions while reducing search times by up to three times.