Goto

Collaborating Authors

 Instructional Material


Accelerated Online Reinforcement Learning using Auxiliary Start State Distributions

arXiv.org Artificial Intelligence

A long-standing problem in online reinforcement learning (RL) is of ensuring sample efficiency, which stems from an inability to explore environments efficiently. Most attempts at efficient exploration tackle this problem in a setting where learning begins from scratch, without prior information available to bootstrap learning. However, such approaches fail to leverage expert demonstrations and simulators that can reset to arbitrary states. These affordances are valuable resources that offer enormous potential to guide exploration and speed up learning. In this paper, we explore how a small number of expert demonstrations and a simulator allowing arbitrary resets can accelerate learning during online RL. We find that training with a suitable choice of an auxiliary start state distribution that may differ from the true start state distribution of the underlying Markov Decision Process can significantly improve sample efficiency. We find that using a notion of safety to inform the choice of this auxiliary distribution significantly accelerates learning. By using episode length information as a way to operationalize this notion, we demonstrate state-of-the-art sample efficiency on a sparse-reward hard-exploration environment.


Replacing thinking with tool usage enables reasoning in small language models

arXiv.org Artificial Intelligence

Recent advances have established a new machine learning paradigm based on scaling up compute at inference time as well as at training time. In that line of work, a combination of Supervised Fine-Tuning (SFT) on synthetic demonstrations and Reinforcement Learning with Verifiable Rewards (RLVR) is used for training Large Language Models to expend extra compute during inference in the form of "thoughts" expressed in natural language. In this paper, we propose to instead format these tokens as a multi-turn interaction trace with a stateful tool. At each turn, the new state of the tool is appended to the context of the model, whose job is to generate the tokens necessary to control the tool via a custom DSL. We benchmark this approach on the problem of repairing malfunctioning Python code, and show that this constrained setup allows for faster sampling of experience and a denser reward signal, allowing even models of size up to 3B parameters to learn how to proficiently expend additional compute on the task.


Detection of Disengagement from Voluntary Quizzes: An Explainable Machine Learning Approach in Higher Distance Education

arXiv.org Artificial Intelligence

--Students disengaging from their tasks can have serious long-term consequences, including academic drop-out. This is particularly relevant for students in distance education. One way to measure the level of disengagement in distance education is to observe participation in non-mandatory exercises in different online courses. In this paper, we detect student disengagement in the non-mandatory quizzes of 42 courses in four semesters from a distance-based university. We carefully identified the most informative student log data that could be extracted and processed from Moodle. Then, eight machine learning algorithms were trained and compared to obtain the highest possible prediction accuracy. Using the SHAP method, we developed an explainable machine learning framework that allows practitioners to better understand the decisions of the trained algorithm. The experimental results show a balanced accuracy of 91%, where about 85% of disengaged students were correctly detected. On top of the highly predictive performance and explainable framework, we provide a discussion on how to design a timely intervention to minimise disengagement from voluntary tasks in online learning. HE advent of distance education has made learning more flexible than ever before. Instead of having to attend classes and solve tasks at specific time, students are granted more freedom in choosing when to engage with their academic workload. This flexibility attracts many non-traditional student groups to higher education, including students that are employed outside of their studies, either fully or part-time. While deadlines are still set in place, students are responsible themselves for planning and time management, especially as far as non-mandatory tasks and exercises are concerned. This freedom can also lead to satisficing behaviour, meaning students only do the bare minimum to pass their courses (see e.g., [1], [2]). Bergamin are with the Institute for Research in Open-, Distance-and eLearning, Swiss Distance University of Applied Sciences, Brig, CH-3900, Switzerland (e-mail addresses: behnam.parsaeifard@ffhs.ch, N. Bergamin (e-mail address: nicole.bergamin@ffhs.ch) is with Department of Informatics, Swiss Distance University of Applied Sciences, Brig, CH-3900, Switzerland. Bergamin is also with the North-West University, Potchefstroom, 2531, South Africa. The COVID-19 pandemic is thought to have fostered this kind of behaviour even more [4]. Non-completion of voluntary tasks, such as optional quizzes, is a form of behavioural disengagement strongly linked to academic drop-out or attrition [5]-[8].


A Dynamical Systems Perspective on the Analysis of Neural Networks

arXiv.org Artificial Intelligence

In this chapter, we utilize dynamical systems to analyze several aspects of machine learning algorithms. As an expository contribution we demonstrate how to re-formulate a wide variety of challenges from deep neural networks, (stochastic) gradient descent, and related topics into dynamical statements. We also tackle three concrete challenges. First, we consider the process of information propagation through a neural network, i.e., we study the input-output map for different architectures. We explain the universal embedding property for augmented neural ODEs representing arbitrary functions of given regularity, the classification of multilayer perceptrons and neural ODEs in terms of suitable function classes, and the memory-dependence in neural delay equations. Second, we consider the training aspect of neural networks dynamically. We describe a dynamical systems perspective on gradient descent and study stability for overdetermined problems. We then extend this analysis to the overparameterized setting and describe the edge of stability phenomenon, also in the context of possible explanations for implicit bias. For stochastic gradient descent, we present stability results for the overparameterized setting via Lyapunov exponents of interpolation solutions. Third, we explain several results regarding mean-field limits of neural networks. We describe a result that extends existing techniques to heterogeneous neural networks involving graph limits via digraph measures. This shows how large classes of neural networks naturally fall within the framework of Kuramoto-type models on graphs and their large-graph limits. Finally, we point out that similar strategies to use dynamics to study explainable and reliable AI can also be applied to settings such as generative models or fundamental issues in gradient training methods, such as backpropagation or vanishing/exploding gradients.


Kronecker-factored Approximate Curvature (KFAC) From Scratch

arXiv.org Artificial Intelligence

Kronecker-factored approximate curvature (KFAC) is arguably one of the most prominent curvature approximations in deep learning. Its applications range from optimization to Bayesian deep learning, training data attribution with influence functions, and model compression or merging. While the intuition behind KFAC is easy to understand, its implementation is tedious: It comes in many flavours, has common pitfalls when translating the math to code, and is challenging to test, which complicates ensuring a properly functioning implementation. Some of the authors themselves have dealt with these challenges and experienced the discomfort of not being able to fully test their code. Thanks to recent advances in understanding KFAC, we are now able to provide test cases and a recipe for a reliable KFAC implementation. This tutorial is meant as a ground-up introduction to KFAC. In contrast to the existing work, our focus lies on providing both math and code side-by-side and providing test cases based on the latest insights into KFAC that are scattered throughout the literature. We hope this tutorial provides a contemporary view of KFAC that allows beginners to gain a deeper understanding of this curvature approximation while lowering the barrier to its implementation, extension, and usage in practice.


Teacher training in the age of AI: Impact on AI Literacy and Teachers' Attitudes

arXiv.org Artificial Intelligence

The rapid integration of artificial intelligence (AI) in education requires teachers to develop AI competencies while preparing students for a society influenced by AI. This study evaluates the impact of an online teacher training program on German in-service teachers' AI literacy, usage behaviors, and attitudes toward AI. A pre-post design study was conducted with teachers (N1 = 291 for AI literacy, N2 = 436 for attitude assessment) participating in the course. The program combined synchronous and asynchronous learning formats, including webinars, self-paced modules, and practical projects. The participants exhibited notable improvements across all domains: AI literacy scores increased significantly, and all attitude items regarding AI usage and integration demonstrated significant positive changes. Teachers reported increased confidence in AI integration. Structured teacher training programs effectively enhance AI literacy and foster positive attitudes toward AI in education.


Toward Cyclic A.I. Modelling of Self-Regulated Learning: A Case Study with E-Learning Trace Data

arXiv.org Artificial Intelligence

Many e-learning platforms assert their ability or potential to improve students' self-regulated learning (SRL), however the cyclical and undirected nature of SRL theoretical models represent significant challenges for representation within contemporary machine learning frameworks. We apply SRL-informed features to trace data in order to advance modelling of students' SRL activities, to improve predictability and explainability regarding the causal effects of learning in an eLearning environment. We demonstrate that these features improve predictive accuracy and validate the value of further research into cyclic modelling techniques for SRL.


Crafting Hanzi as Narrative Bridges: An AI Co-Creation Workshop for Elderly Migrants

arXiv.org Artificial Intelligence

This paper explores how older adults, particularly aging migrants in urban China, can engage AI-assisted co-creation to express personal narratives that are often fragmented, underrepresented, or difficult to verbalize. Through a pilot workshop combining oral storytelling and the symbolic reconstruction of Hanzi, participants shared memories of migration and recreated new character forms using Xiaozhuan glyphs, suggested by the Large Language Model (LLM), together with physical materials. Supported by human facilitation and a soft AI presence, participants transformed lived experience into visual and tactile expressions without requiring digital literacy. This approach offers new perspectives on human-AI collaboration and aging by repositioning AI not as a content producer but as a supportive mechanism, and by supporting narrative agency within sociotechnical systems.


High-Order Deep Meta-Learning with Category-Theoretic Interpretation

arXiv.org Artificial Intelligence

We introduce a new hierarchical deep learning framework for recursive higher-order meta-learning that enables neural networks (NNs) to construct, solve, and generalise across hierarchies of tasks. Central to this approach is a generative mechanism that creates \emph{virtual tasks} -- synthetic problem instances designed to enable the meta-learner to learn \emph{soft constraints} and unknown generalisable rules across related tasks. Crucially, this enables the framework to generate its own informative, task-grounded datasets thereby freeing machine learning (ML) training from the limitations of relying entirely on human-generated data. By actively exploring the virtual point landscape and seeking out tasks lower-level learners find difficult, the meta-learner iteratively refines constraint regions. This enhances inductive biases, regularises the adaptation process, and produces novel, unanticipated tasks and constraints required for generalisation. Each meta-level of the hierarchy corresponds to a progressively abstracted generalisation of problems solved at lower levels, enabling a structured and interpretable learning progression. By interpreting meta-learners as category-theoretic \emph{functors} that generate and condition a hierarchy of subordinate learners, we establish a compositional structure that supports abstraction and knowledge transfer across progressively generalised tasks. The category-theoretic perspective unifies existing meta-learning models and reveals how learning processes can be transformed and compared through functorial relationships, while offering practical design principles for structuring meta-learning. We speculate this architecture may underpin the next generation of NNs capable of autonomously generating novel, instructive tasks and their solutions, thereby advancing ML towards general artificial intelligence.


Chargax: A JAX Accelerated EV Charging Simulator

arXiv.org Artificial Intelligence

Deep Reinforcement Learning can play a key role in addressing sustainable energy challenges. For instance, many grid systems are heavily congested, highlighting the urgent need to enhance operational efficiency. However, reinforcement learning approaches have traditionally been slow due to the high sample complexity and expensive simulation requirements. While recent works have effectively used GPUs to accelerate data generation by converting environments to JAX, these works have largely focussed on classical toy problems. This paper introduces Chargax, a JAX-based environment for realistic simulation of electric vehicle charging stations designed for accelerated training of RL agents. We validate our environment in a variety of scenarios based on real data, comparing reinforcement learning agents against baselines. Chargax delivers substantial computational performance improvements of over 100x-1000x over existing environments. Additionally, Chargax' modular architecture enables the representation of diverse real-world charging station configurations.