Learning Management
Hierarchical Upper Confidence Bounds for Constrained Online Learning
The multi-armed bandit (MAB) problem is a foundational framework in sequential decision-making under uncertainty, extensively studied for its applications in areas such as clinical trials, online advertising, and resource allocation. Traditional MAB formulations, however, do not adequately capture scenarios where decisions are structured hierarchically, involve multi-level constraints, or feature context-dependent action spaces. In this paper, we introduce the hierarchical constrained bandits (HCB) framework, which extends the contextual bandit problem to incorporate hierarchical decision structures and multi-level constraints. We propose the hierarchical constrained upper confidence bound (HC-UCB) algorithm, designed to address the complexities of the HCB problem by leveraging confidence bounds within a hierarchical setting. Our theoretical analysis establishes sublinear regret bounds for HC-UCB and provides high-probability guarantees for constraint satisfaction at all hierarchical levels. Furthermore, we derive a minimax lower bound on the regret for the HCB problem, demonstrating the near-optimality of our algorithm. The results are significant for real-world applications where decision-making processes are inherently hierarchical and constrained, offering a robust and efficient solution that balances exploration and exploitation across multiple levels of decision-making.
Guiding Empowerment Model: Liberating Neurodiversity in Online Higher Education
Beaux, Hannah, Karimi, Pegah, Pop, Otilia, Clark, Rob
In this innovative practice full paper, we address the equity gap for neurodivergent and situationally limited learners by identifying the spectrum of dynamic factors that impact learning and function. Educators have shown a growing interest in identifying learners' cognitive abilities and learning preferences to measure their impact on academic achievement. Often institutions employ one-size-fits-all approaches leaving the burden on disabled students to self-advocate or tolerate inadequate support. Emerging frameworks guide neurodivergent learners through instructional approaches, such as online education. However, these frameworks fail to address holistic environmental needs or recommend technology interventions, particularly for those with undisclosed learning or developmental disabilities and situational limitations. In this article, we integrate a neurodivergent perspective through secondary research of around 100 articles to introduce a Guiding Empowerment Model involving key cognitive and situational factors that contextualize day-to-day experiences affecting learner ability. We synthesize three sample student profiles that highlight user problems in functioning. We use this model to evaluate sample learning platform features and other supportive technology solutions. The proposed approach augments frameworks such as Universal Design for Learning to consider factors including various sensory processing differences, social connection challenges, and environmental limitations. We suggest that by applying the mode through technology-enabled features such as customizable task management, guided varied content access, and guided multi-modal collaboration, major learning barriers of neurodivergent and situationally limited learners will be removed to activate the successful pursuit of their academic goals.
Slicing for AI: An Online Learning Framework for Network Slicing Supporting AI Services
Helmy, Menna, Abdellatif, Alaa Awad, Mhaisen, Naram, Mohamed, Amr, Erbad, Aiman
The forthcoming 6G networks will embrace a new realm of AI-driven services that requires innovative network slicing strategies, namely slicing for AI, which involves the creation of customized network slices to meet Quality of service (QoS) requirements of diverse AI services. This poses challenges due to time-varying dynamics of users' behavior and mobile networks. Thus, this paper proposes an online learning framework to optimize the allocation of computational and communication resources to AI services, while considering their unique key performance indicators (KPIs), such as accuracy, latency, and cost. We define a problem of optimizing the total accuracy while balancing conflicting KPIs, prove its NP-hardness, and propose an online learning framework for solving it in dynamic environments. We present a basic online solution and two variations employing a pre-learning elimination method for reducing the decision space to expedite the learning. Furthermore, we propose a biased decision space subset selection by incorporating prior knowledge to enhance the learning speed without compromising performance and present two alternatives of handling the selected subset. Our results depict the efficiency of the proposed solutions in converging to the optimal decisions, while reducing decision space and improving time complexity.
Efficient Function Placement in Virtual Networks: An Online Learning Approach
Huang, Wei, Combes, Richard, Castel-Taleb, Hind, Jouaber, Badii
We propose a model for the virtual function placement problem and several novel algorithms using ideas based on multi-armed bandits. We prove that these algorithms learn the optimal placement policy rapidly, and their regret grows at a rate at most $O( N M \sqrt{T\ln T} )$ while respecting the feasibility constraints with high probability. We show through numerical experiments that those algorithms both have good practical performance and modest computational complexity. Using the proposed acceleration technique, they can be used to learn in large networks where computational power is limited. Our experiments are fully reproducible, and the code is publicly available.
An Online Learning Approach to Prompt-based Selection of Generative Models
Hu, Xiaoyan, Leung, Ho-fung, Farnia, Farzan
Selecting a sample generation scheme from multiple text-based generative models is typically addressed by choosing the model that maximizes an averaged evaluation score. However, this score-based selection overlooks the possibility that different models achieve the best generation performance for different types of text prompts. An online identification of the best generation model for various input prompts can reduce the costs associated with querying sub-optimal models. In this work, we explore the possibility of varying rankings of text-based generative models for different text prompts and propose an online learning framework to predict the best data generation model for a given input prompt. The proposed framework adapts the kernelized contextual bandit (CB) methodology to a CB setting with shared context variables across arms, utilizing the generated data to update a kernel-based function that predicts which model will achieve the highest score for unseen text prompts. Additionally, we apply random Fourier features (RFF) to the kernelized CB algorithm to accelerate the online learning process and establish a $\widetilde{\mathcal{O}}(\sqrt{T})$ regret bound for the proposed RFF-based CB algorithm over T iterations. Our numerical experiments on real and simulated text-to-image and image-to-text generative models show RFF-UCB performs successfully in identifying the best generation model across different sample types.
Fast Online Learning of CLiFF-maps in Changing Environments
Zhu, Yufei, Rudenko, Andrey, Palmieri, Luigi, Heuer, Lukas, Lilienthal, Achim J., Magnusson, Martin
Maps of dynamics are effective representations of motion patterns learned from prior observations, with recent research demonstrating their ability to enhance performance in various downstream tasks such as human-aware robot navigation, long-term human motion prediction, and robot localization. Current advancements have primarily concentrated on methods for learning maps of human flow in environments where the flow is static, i.e., not assumed to change over time. In this paper we propose a method to update the CLiFF-map, one type of map of dynamics, for achieving efficient life-long robot operation. As new observations are collected, our goal is to update a CLiFF-map to effectively and accurately integrate new observations, while retaining relevant historic motion patterns. The proposed online update method maintains a probabilistic representation in each observed location, updating parameters by continuously tracking sufficient statistics. In experiments using both synthetic and real-world datasets, we show that our method is able to maintain accurate representations of human motion dynamics, contributing to high performance flow-compliant planning downstream tasks, while being orders of magnitude faster than the comparable baselines.
Personalised Feedback Framework for Online Education Programmes Using Generative AI
Kuzminykh, Ievgeniia, Nawaz, Tareita, Shenzhang, Shihao, Ghita, Bogdan, Raphael, Jeffery, Xiao, Hannan
AI tools, particularly large language modules, have recently proven their effectiveness within learning management systems and online education programmes. As feedback continues to play a crucial role in learning and assessment in schools, educators must carefully customise the use of AI tools in order to optimally support students in their learning journey. Efforts to improve educational feedback systems have seen numerous attempts reflected in the research studies but mostly have been focusing on qualitatively benchmarking AI feedback against human-generated feedback. This paper presents an exploration of an alternative feedback framework which extends the capabilities of ChatGPT by integrating embeddings, enabling a more nuanced understanding of educational materials and facilitating topic-targeted feedback for quiz-based assessments. As part of the study, we proposed and developed a proof of concept solution, achieving an efficacy rate of 90% and 100% for open-ended and multiple-choice questions, respectively. The results showed that our framework not only surpasses expectations but also rivals human narratives, highlighting the potential of AI in revolutionising educational feedback mechanisms.
Online Learning with Gaussian Payoffs and Side Observations
We consider a sequential learning problem with Gaussian payoffs and side information: after selecting an action i, the learner receives information about the payoff of every action j in the form of Gaussian observations whose mean is the same as the mean payoff, but the variance depends on the pair (i,j) (and may be infinite). The setup allows a more refined information transfer from one action to another than previous partial monitoring setups, including the recently introduced graph-structured feedback case. For the first time in the literature, we provide non-asymptotic problem-dependent lower bounds on the regret of any algorithm, which recover existing asymptotic problem-dependent lower bounds and finite-time minimax lower bounds available in the literature. We also provide algorithms that achieve the problem-dependent lower bound (up to some universal constant factor) or the minimax lower bounds (up to logarithmic factors).
Online Learning with Adversarial Delays
We study the performance of standard online learning algorithms when the feedback is delayed by an adversary. This bound collapses to an optimal O(\sqrt{T}) bound in the usual setting of no delays (where D T). Our main contribution is to show that standard algorithms for online learning already have simple regret bounds in the most general setting of delayed feedback, making adjustments to the analysis and not to the algorithms themselves. Our results help affirm and clarify the success of recent algorithms in optimization and machine learning that operate in a delayed feedback model.
A Trichotomy for Transductive Online Learning
This setting is similar to standard online learning, except that the adversary fixes a sequence of instances x_1,\dots,x_n to be labeled at the start of the game, and this sequence is known to the learner. Qualitatively, we prove a \emph{trichotomy}, stating that the minimal number of mistakes made by the learner as n grows can take only one of precisely three possible values: n, \Theta\left(\log (n)\right), or \Theta(1) . Furthermore, this behavior is determined by a combination of the VC dimension and the Littlestone dimension. Quantitatively, we show a variety of bounds relating the number of mistakes to well-known combinatorial dimensions. In particular, we improve the known lower bound on the constant in the \Theta(1) case from \Omega\left(\sqrt{\log(d)}\right) to \Omega(\log(d)) where d is the Littlestone dimension.