Instructional Material
Towards Actionable Pedagogical Feedback: A Multi-Perspective Analysis of Mathematics Teaching and Tutoring Dialogue
Naim, Jannatun, Cao, Jie, Tasneem, Fareen, Jacobs, Jennifer, Milne, Brent, Martin, James, Sumner, Tamara
Effective feedback is essential for refining instructional practices in mathematics education, and researchers often turn to advanced natural language processing (NLP) models to analyze classroom dialogues from multiple perspectives. However, utterance-level discourse analysis encounters two primary challenges: (1) multifunctionality, where a single utterance may serve multiple purposes that a single tag cannot capture, and (2) the exclusion of many utterances from domain-specific discourse move classifications, leading to their omission in feedback. To address these challenges, we proposed a multi-perspective discourse analysis that integrates domain-specific talk moves with dialogue act (using the flattened multi-functional SWBD-MASL schema with 43 tags) and discourse relation (applying Segmented Discourse Representation Theory with 16 relations). Our top-down analysis framework enables a comprehensive understanding of utterances that contain talk moves, as well as utterances that do not contain talk moves. This is applied to two mathematics education datasets: TalkMoves (teaching) and SAGA22 (tutoring). Through distributional unigram analysis, sequential talk move analysis, and multi-view deep dive, we discovered meaningful discourse patterns, and revealed the vital role of utterances without talk moves, demonstrating that these utterances, far from being mere fillers, serve crucial functions in guiding, acknowledging, and structuring classroom discourse. These insights underscore the importance of incorporating discourse relations and dialogue acts into AI-assisted education systems to enhance feedback and create more responsive learning environments. Our framework may prove helpful for providing human educator feedback, but also aiding in the development of AI agents that can effectively emulate the roles of both educators and students.
Globally Optimal Data-Association-Free Landmark-Based Localization Using Semidefinite Relaxations
Korotkine, Vassili, Cohen, Mitchell, Forbes, James Richard
--This paper proposes a semidefinite relaxation for landmark-based localization with unknown data associations in planar environments. The proposed method simultaneously solves for the optimal robot states and data associations in a globally optimal fashion. Relative position measurements to a fixed set of known landmarks are used, but the data association is unknown in that the robot does not know which landmark each measurement is generated from. The relaxation is shown to be tight in a majority of cases for moderate noise levels. The proposed algorithm is compared to local Gauss-Newton baselines initialized at the dead-reckoned trajectory, and is shown to significantly improve convergence to the problem's global optimum in simulation and experiment. STIMA TING the state of a robot from noisy and incomplete sensor data is a central task associated with autonomy. In the landmark-based localization task, the robot infers its position and orientation from measurements from landmarks with known positions. State estimation methods for localization can be split into filtering methods and batch optimization methods [1].
Transparent Adaptive Learning via Data-Centric Multimodal Explainable AI
Mosleh, Maryam, Devlin, Marie, Solaiman, Ellis
Artificial intelligence - driven adaptive learning systems are reshaping education through data - driven adaptation of learning experiences. Yet many of these systems lack transparency, offering limited insight into how decisions are made. Most explainable AI (XAI) techniques focus on technical outputs but neglect user roles and comprehension. This paper proposes a hybrid framework that integrates traditional XAI techniques with generative AI models and u ser personalisation to generate multimodal, personalised explanations tailored to user needs. We redefine explainability as a dynamic communication process tailored to user roles and learning goals. We outline the framework ' s design, key XAI limitations in education, and research directions on accuracy, fairness, and personalisation. Our aim is to move towards explainable AI that enhances transparency while supporting user - centred experiences.
Learning to Optimize Feedback for One Million Students: Insights from Multi-Armed and Contextual Bandits in Large-Scale Online Tutoring
Schmucker, Robin, Pachapurkar, Nimish, Bala, Shanmuga, Shah, Miral, Mitchell, Tom
We present an online tutoring system that learns to provide effective feedback to students after they answer questions incorrectly. Using data from one million students, the system learns which assistance action (e.g., one of multiple hints) to provide for each question to optimize student learning. Employing the multi-armed bandit (MAB) framework and offline policy evaluation, we assess 43,000 assistance actions, and identify trade-offs between assistance policies optimized for different student outcomes (e.g., response correctness, session completion). We design an algorithm that for each question decides on a suitable policy training objective to enhance students' immediate second attempt success and overall practice session performance. We evaluate the resulting MAB policies in 166,000 practice sessions, verifying significant improvements in student outcomes. While MAB policies optimize feedback for the overall student population, we further investigate whether contextual bandit (CB) policies can enhance outcomes by personalizing feedback based on individual student features (e.g., ability estimates, response times). Using causal inference, we examine (i) how effects of assistance actions vary across students and (ii) whether CB policies, which leverage such effect heterogeneity, outperform MAB policies. While our analysis reveals that some actions for some questions exhibit effect heterogeneity, effect sizes may often be too small for CB policies to provide significant improvements beyond what well-optimized MAB policies that deliver the same action to all students already achieve. We discuss insights gained from deploying data-driven systems at scale and implications for future refinements. Today, the teaching policies optimized by our system support thousands of students daily.
Multi-modal Relational Item Representation Learning for Inferring Substitutable and Complementary Items
Wang, Junting, Guo, Chenghuan, Yang, Jiao, Guo, Yanhui, Gao, Yan, Sundaram, Hari
We introduce a novel self-supervised multi-modal relational item representation learning framework designed to infer substitutable and complementary items. Existing approaches primarily focus on modeling item-item associations deduced from user behaviors using graph neural networks (GNNs) or leveraging item content information. However, these methods often overlook critical challenges, such as noisy user behavior data and data sparsity due to the long-tailed distribution of these behaviors. In this paper, we propose MMSC, a self-supervised multi-modal relational item representation learning framework to address these challenges. Specifically, MMSC consists of three main components: (1) a multi-modal item representation learning module that leverages a multi-modal foundational model and learns from item metadata, (2) a self-supervised behavior-based representation learning module that denoises and learns from user behavior data, and (3) a hierarchical representation aggregation mechanism that integrates item representations at both the semantic and task levels. Additionally, we leverage LLMs to generate augmented training data, further enhancing the denoising process during training. We conduct extensive experiments on five real-world datasets, showing that MMSC outperforms existing baselines by 26.1% for substitutable recommendation and 39.2% for complementary recommendation. In addition, we empirically show that MMSC is effective in modeling cold-start items.
EducationQ: Evaluating LLMs' Teaching Capabilities Through Multi-Agent Dialogue Framework
Shi, Yao, Liang, Rongkeng, Xu, Yong
Large language models (LLMs) increasingly serve as educational tools, yet evaluating their teaching capabilities remains challenging due to the resource-intensive, context-dependent, and methodologically complex nature of teacher-student interactions. We introduce EducationQ, a multi-agent dialogue framework that efficiently assesses teaching capabilities through simulated dynamic educational scenarios, featuring specialized agents for teaching, learning, and evaluation. Testing 14 LLMs across major AI Organizations (OpenAI, Meta, Google, Anthropic, and others) on 1,498 questions spanning 13 disciplines and 10 difficulty levels reveals that teaching effectiveness does not correlate linearly with model scale or general reasoning capabilities - with some smaller open-source models outperforming larger commercial counterparts in teaching contexts. This finding highlights a critical gap in current evaluations that prioritize knowledge recall over interactive pedagogy. Our mixed-methods evaluation, combining quantitative metrics with qualitative analysis and expert case studies, identifies distinct pedagogical strengths employed by top-performing models (e.g., sophisticated questioning strategies, adaptive feedback mechanisms). Human expert evaluations show 78% agreement with our automated qualitative analysis of effective teaching behaviors, validating our methodology. EducationQ demonstrates that LLMs-as-teachers require specialized optimization beyond simple scaling, suggesting next-generation educational AI prioritize targeted enhancement of specific pedagogical effectiveness.
ELMES: An Automated Framework for Evaluating Large Language Models in Educational Scenarios
Wei, Shou'ang, Wang, Xinyun, Bi, Shuzhen, Chen, Jian, Li, Ruijia, Jiang, Bo, Lin, Xin, Zhang, Min, Song, Yu, Li, BingDong, Zhou, Aimin, Hao, Hao
The emergence of Large Language Models (LLMs) presents transformative opportunities for education, generating numerous novel application scenarios. However, significant challenges remain: evaluation metrics vary substantially across di ff erent educational scenarios, while many emerging scenarios lack appropriate assessment metrics. To address this gap, we introduce ELMES, an open-source automated evaluation framework specifically designed for assessing LLMs in educational settings. ELMES features a modular architecture that enables researchers to create dynamic, multi-agent dialogues through simple configuration files, facilitating flexible scenario design without requiring extensive programming expertise. The framework incorporates a hybrid evaluation engine that objectively quantifies traditionally subjective pedagogical metrics using an LLM-as-a-Judge methodology. We conduct systematic benchmarking of state-of-the-art LLMs across four critical educational scenarios: Knowledge Point Explanation, Guided Problem-Solving Teaching, Interdisciplinary Lesson Plan Generation, and Contextualized Question Generation, employing fine-grained metrics developed in collaboration with education specialists. Our results demonstrate distinct capability distributions among models, revealing context-specific strengths and limitations. ELMES provides educators and researchers with an accessible evaluation framework that significantly reduces adaptation barriers for diverse educational applications while advancing the practical implementation of LLMs in pedagogy. Introduction The advent of Large Language Models (LLMs) is reshap-ing the educational paradigm with unprecedented potential [1]. Their powerful capabilities in natural language understanding and generation have paved new ways for intelligent teaching and learning. Consequently, researchers are actively exploring various avenues to leverage LLMs for educational empowerment.
SmartCourse: A Contextual AI-Powered Course Advising System for Undergraduates
Mi, Yixuan, Yu, Yiduo, Zhao, Yiyi
We present SmartCourse, an integrated course management and AI-driven advising system for undergraduate students (specifically tailored to the Computer Science (CPS) major). SmartCourse addresses the limitations of traditional advising tools by integrating transcript and plan information for student-specific context. The system combines a command-line interface (CLI) and a Gradio web GUI for instructors and students, manages user accounts, course enrollment, grading, and four-year degree plans, and integrates a locally hosted large language model (via Ollama) for personalized course recommendations. It leverages transcript and major plan to offer contextual advice (e.g., prioritizing requirements or retakes). We evaluated the system on 25 representative advising queries and introduced custom metrics: PlanScore, PersonalScore, Lift, and Recall to assess recommendation quality across different context conditions. Experiments show that using full context yields substantially more relevant recommendations than context-omitted modes, confirming the necessity of transcript and plan information for personalized academic advising. SmartCourse thus demonstrates how transcript-aware AI can enhance academic planning.
Hybrid EEG--Driven Brain--Computer Interface: A Large Language Model Framework for Personalized Language Rehabilitation
Hossain, Ismail, Banik, Mridul
--Conventional augmentative and alternative communication (AAC) systems and language-learning platforms often fail to adapt in real time to the user's cognitive and linguistic needs, especially in neurological conditions such as post-stroke aphasia or amyotrophic lateral sclerosis. Recent advances in noninvasive electroencephalography (EEG)-based brain-computer interfaces (BCIs) and transformer-based large language models (LLMs) offer complementary strengths: BCIs capture users' neural intent with low fatigue, while LLMs generate contextually tailored language content. Objective: We propose and evaluate a novel hybrid framework that leverages real-time EEG signals to drive an LLM-powered language rehabilitation assistant. This system aims to: (1) enable users with severe speech or motor impairments to navigate language-learning modules via mental commands; (2) dynamically personalize vocabulary, sentence-construction exercises, and corrective feedback; and (3) monitor neural markers of cognitive effort to adjust task difficulty on the fly. All individuals have the right to self-expression, social participation, and the agency to impact their environment. For individuals with complex communication needs, augmentative and alternative communication (AAC) systems provide critical tools to facilitate communication. However, traditional AAC methods--such as printed communication boards or eye gaze devices--may not be accessible for individuals with severe speech and physical impairments (SSPI).
AI-Reporter: A Path to a New Genre of Scientific Communication
The AI-Reporter represents a paradigmatic shift in scientific publication practice. This document demonstrates through a concrete case study how our system transforms academic presentations into publication-ready chapters -- in less than three minutes. Using Arno Simons' lecture on Large Language Models from the ``Large Language Models for the History, Philosophy, and Sociology of Science'' workshop (NEPI) as an example, we show how technological innovation bridges the gap between ephemeral presentation and permanent scientific documentation.