Instructional Material
Adaptive Learning Systems: Personalized Curriculum Design Using LLM-Powered Analytics
Li, Yongjie, Nong, Ruilin, Liu, Jianan, Evans, Lucas
--Large language models (LLMs) are revolutionizing the field of education by enabling personalized learning experiences tailored to individual student needs. For example, the curriculum reinforcement learning approach tailored for quantum architecture search effectively enhances computational efficiency in noisy environments by leveraging an optimized simulator [16]. The expectations and attitudes of students and teachers towards learning analytics are pivotal for effective implementation in higher education, as highlighted by recent assessments [22]. The framework's efficacy was confirmed through thorough Consequently, the personalized curriculum dynamically evolves to reflect each learner's progress, ensuring optimal The system continuously evaluates the learner's engagement Let us denote the student's performance metrics as We focus on leveraging the embeddings generated by the LLMs to classify learning materials and identify optimal pathways for students. Additionally, we assess model performance using metrics such as Learner Engagement Scores (LES) and Knowledge Retention Rates (KRR) across the implemented curriculum.
A Comprehensive Review of AI-based Intelligent Tutoring Systems: Applications and Challenges
Zerkouk, Meriem, Mihoubi, Miloud, Chikhaoui, Belkacem
AI-based Intelligent Tutoring Systems (ITS) have significant potential to transform teaching and learning. As efforts continue to design, develop, and integrate ITS into educational contexts, mixed results about their effectiveness have emerged. This paper provides a comprehensive review to understand how ITS operate in real educational settings and to identify the associated challenges in their application and evaluation. We use a systematic literature review method to analyze numerous qualified studies published from 2010 to 2025, examining domains such as pedagogical strategies, NLP, adaptive learning, student modeling, and domain-specific applications of ITS. The results reveal a complex landscape regarding the effectiveness of ITS, highlighting both advancements and persistent challenges. The study also identifies a need for greater scientific rigor in experimental design and data analysis. Based on these findings, suggestions for future research and practical implications are proposed.
Test-time Offline Reinforcement Learning on Goal-related Experience
Bagatella, Marco, Albaba, Mert, Hรผbotter, Jonas, Martius, Georg, Krause, Andreas
Foundation models compress a large amount of information in a single, large neural network, which can then be queried for individual tasks. There are strong parallels between this widespread framework and offline goal-conditioned reinforcement learning algorithms: a universal value function is trained on a large number of goals, and the policy is evaluated on a single goal in each test episode. Extensive research in foundation models has shown that performance can be substantially improved through test-time training, specializing the model to the current goal. We find similarly that test-time offline reinforcement learning on experience related to the test goal can lead to substantially better policies at minimal compute costs. We propose a novel self-supervised data selection criterion, which selects transitions from an offline dataset according to their relevance to the current state and quality with respect to the evaluation goal. We demonstrate across a wide range of high-dimensional loco-navigation and manipulation tasks that fine-tuning a policy on the selected data for a few gradient steps leads to significant performance gains over standard offline pre-training. Our goal-conditioned test-time training (GC-TTT) algorithm applies this routine in a receding-horizon fashion during evaluation, adapting the policy to the current trajectory as it is being rolled out. Finally, we study compute allocation at inference, demonstrating that, at comparable costs, GC-TTT induces performance gains that are not achievable by scaling model size.
Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits
Chen, Fan, Jia, Zeyu, Rakhlin, Alexander, Xie, Tengyang
Reinforcement learning with outcome-based feedback faces a fundamental challenge: when rewards are only observed at trajectory endpoints, how do we assign credit to the right actions? This paper provides the first comprehensive analysis of this problem in online RL with general function approximation. We develop a provably sample-efficient algorithm achieving $\widetilde{O}({C_{\rm cov} H^3}/{ฮต^2})$ sample complexity, where $C_{\rm cov}$ is the coverability coefficient of the underlying MDP. By leveraging general function approximation, our approach works effectively in large or infinite state spaces where tabular methods fail, requiring only that value functions and reward functions can be represented by appropriate function classes. Our results also characterize when outcome-based feedback is statistically separated from per-step rewards, revealing an unavoidable exponential separation for certain MDPs. For deterministic MDPs, we show how to eliminate the completeness assumption, dramatically simplifying the algorithm. We further extend our approach to preference-based feedback settings, proving that equivalent statistical efficiency can be achieved even under more limited information. Together, these results constitute a theoretical foundation for understanding the statistical properties of outcome-based reinforcement learning.
Multimodal Behavioral Patterns Analysis with Eye-Tracking and LLM-Based Reasoning
Guo, Dongyang, Abdrabou, Yasmeen, Thaqi, Enkeleda, Kasneci, Enkelejda
Eye-tracking data reveals valuable insights into users' cognitive states but is difficult to analyze due to its structured, non-linguistic nature. While large language models (LLMs) excel at reasoning over text, they struggle with temporal and numerical data. This paper presents a multimodal human-AI collaborative framework designed to enhance cognitive pattern extraction from eye-tracking signals. The framework includes: (1) a multi-stage pipeline using horizontal and vertical segmentation alongside LLM reasoning to uncover latent gaze patterns; (2) an Expert-Model Co-Scoring Module that integrates expert judgment with LLM output to generate trust scores for behavioral interpretations; and (3) a hybrid anomaly detection module combining LSTM-based temporal modeling with LLM-driven semantic analysis. Our results across several LLMs and prompt strategies show improvements in consistency, interpretability, and performance, with up to 50% accuracy in difficulty prediction tasks. This approach offers a scalable, interpretable solution for cognitive modeling and has broad potential in adaptive learning, human-computer interaction, and educational analytics.
Multimodal Fine-grained Reasoning for Post Quality Evaluation
Guo, Xiaoxu, Liang, Siyan, Cui, Yachao, Zhou, Juxiang, Wang, Lei, Cao, Han
Accurately assessing post quality requires complex relational reasoning to capture nuanced topic-post relationships. However, existing studies face three major limitations: (1) treating the task as unimodal categorization, which fails to leverage multimodal cues and fine-grained quality distinctions; (2) introducing noise during deep multimodal fusion, leading to misleading signals; and (3) lacking the ability to capture complex semantic relationships like relevance and comprehensiveness. To address these issues, we propose the Multimodal Fine-grained Topic-post Relational Reasoning (MFTRR) framework, which mimics human cognitive processes. MFTRR reframes post-quality assessment as a ranking task and incorporates multimodal data to better capture quality variations. It consists of two key modules: (1) the Local-Global Semantic Correlation Reasoning Module, which models fine-grained semantic interactions between posts and topics at both local and global levels, enhanced by a maximum information fusion mechanism to suppress noise; and (2) the Multi-Level Evidential Relational Reasoning Module, which explores macro- and micro-level relational cues to strengthen evidence-based reasoning. We evaluate MFTRR on three newly constructed multimodal topic-post datasets and the public Lazada-Home dataset. Experimental results demonstrate that MFTRR significantly outperforms state-of-the-art baselines, achieving up to 9.52% NDCG@3 improvement over the best unimodal method on the Art History dataset.
Language-Conditioned Open-Vocabulary Mobile Manipulation with Pretrained Models
Tan, Shen, Zhou, Dong, Shao, Xiangyu, Wang, Junqiao, Sun, Guanghui
Open-vocabulary mobile manipulation (OVMM) that involves the handling of novel and unseen objects across different workspaces remains a significant challenge for real-world robotic applications. In this paper, we propose a novel Language-conditioned Open-V ocabulary Mobile Manipulation framework, named LOVMM, incorporating the large language model (LLM) and vision-language model (VLM) to tackle various mobile manipulation tasks in household environments. "toss the food boxes on the office room desk to the trash bin in the corner", and "pack the bottles from the bed to the box in the guestroom"). Extensive experiments simulated in complex household environments show strong zero-shot generalization and multi-task learning abilities of LOVMM. Moreover, our approach can also generalize to multiple tabletop manipulation tasks and achieve better success rates compared to other state-of-the-art methods. 1 Introduction As one of the key capabilities for robotic home assistance, open-vocabulary mobile manipulation (OVMM), which leverages vision cameras to navigate in the environment and execute human-like actions to manipulate unseen objects, has attracted wide attention. It is crucial for addressing real-world challenges such as object sorting and rearrangement [ Zeng et al., 2022 ], [ Gan et al., 2022 ], household cleanup [ Y anet al., 2021 ], [ Wu et al., 2023 ], and human assistance [ Y enamandraet al., 2023 ], [ Stone et al., 2023 ] . Traditionally, robotic manipulation relies on vision-based methods that use explicit, object-centric representations, including poses, categories, and instance segmentations for perception [ Pan et al., 2023 ], [ Geng et al., 2023a ], [ Xie et al., 2020] . Recently, end-to-end models that learn from expert demonstrations have emerged as promising alternatives [ Zeng et al., 2021 ], [ Seita et al., 2021 ], [ Geng et al., 2023b ] . By leveraging visual observations without any explicit object information, these models are able to extract more generalizable representations across different tasks and zero-shot adapt to unseen scenarios. Y et, such methods are limited by the insufficient information provided by the single-modal data, or they may require goal images as instructions to adapt to new situations.
Students' Feedback Requests and Interactions with the SCRIPT Chatbot: Do They Get What They Ask For?
Scholl, Andreas, Kiesler, Natalie
Building on prior research on Generative AI (GenAI) and related tools for programming education, we developed SCRIPT, a chatbot based on ChatGPT -4o-mini, to support novice learners. SCRIPT allows for open-ended interactions and structured guidance through predefined prompts. We evaluated the tool via an experiment with 136 students from an introductory programming course at a large German university and analyzed how students interacted with SCRIPT while solving programming tasks with a focus on their feedback preferences. The results reveal that students' feedback requests seem to follow a specific sequence. Moreover, the chatbot responses aligned well with students' requested feedback types (in 75%), and it adhered to the system prompt constraints. These insights inform the design of GenAI-based learning support systems and highlight challenges in balancing guidance and flexibility in AI-assisted tools.
DesignLab: Designing Slides Through Iterative Detection and Correction
Yun, Jooyeol, Wang, Heng, Shimose, Yotaro, Choo, Jaegul, Takamatsu, Shingo
Designing high-quality presentation slides can be challenging for non-experts due to the complexity involved in navigating various design choices. Numerous automated tools can suggest layouts and color schemes, yet often lack the ability to refine their own output, which is a key aspect in real-world workflows. We propose DesignLab, which separates the design process into two roles, the design reviewer, who identifies design-related issues, and the design contributor who corrects them. This decomposition enables an iterative loop where the reviewer continuously detects issues and the contributor corrects them, allowing a draft to be further polished with each iteration, reaching qualities that were unattainable. We fine-tune large language models for these roles and simulate intermediate drafts by introducing controlled perturbations, enabling the design reviewer learn design errors and the contributor learn how to fix them. Our experiments show that DesignLab outperforms existing design-generation methods, including a commercial tool, by embracing the iterative nature of designing which can result in polished, professional slides.
Probabilistic Graphical Models: A Concise Tutorial
Maasch, Jacqueline, Neiswanger, Willie, Ermon, Stefano, Kuleshov, Volodymyr
Probabilistic graphical modeling is a branch of machine learning that uses probability distributions to describe the world, make predictions, and support decision-making under uncertainty. Underlying this modeling framework is an elegant body of theory that bridges two mathematical traditions: probability and graph theory. This framework provides compact yet expressive representations of joint probability distributions, yielding powerful generative models for probabilistic reasoning. This tutorial provides a concise introduction to the formalisms, methods, and applications of this modeling framework. After a review of basic probability and graph theory, we explore three dominant themes: (1) the representation of multivariate distributions in the intuitive visual language of graphs, (2) algorithms for learning model parameters and graphical structures from data, and (3) algorithms for inference, both exact and approximate.