Learning Management
Learning and Computation of $\Phi$-Equilibria at the Frontier of Tractability
Zhang, Brian Hu, Anagnostides, Ioannis, Tewolde, Emanuel, Berker, Ratip Emin, Farina, Gabriele, Conitzer, Vincent, Sandholm, Tuomas
$\Phi$-equilibria -- and the associated notion of $\Phi$-regret -- are a powerful and flexible framework at the heart of online learning and game theory, whereby enriching the set of deviations $\Phi$ begets stronger notions of rationality. Recently, Daskalakis, Farina, Fishelson, Pipis, and Schneider (STOC '24) -- abbreviated as DFFPS -- settled the existence of efficient algorithms when $\Phi$ contains only linear maps under a general, $d$-dimensional convex constraint set $\mathcal{X}$. In this paper, we significantly extend their work by resolving the case where $\Phi$ is $k$-dimensional; degree-$\ell$ polynomials constitute a canonical such example with $k = d^{O(\ell)}$. In particular, positing only oracle access to $\mathcal{X}$, we obtain two main positive results: i) a $\text{poly}(n, d, k, \text{log}(1/\epsilon))$-time algorithm for computing $\epsilon$-approximate $\Phi$-equilibria in $n$-player multilinear games, and ii) an efficient online algorithm that incurs average $\Phi$-regret at most $\epsilon$ using $\text{poly}(d, k)/\epsilon^2$ rounds. We also show nearly matching lower bounds in the online learning setting, thereby obtaining for the first time a family of deviations that captures the learnability of $\Phi$-regret. From a technical standpoint, we extend the framework of DFFPS from linear maps to the more challenging case of maps with polynomial dimension. At the heart of our approach is a polynomial-time algorithm for computing an expected fixed point of any $\phi : \mathcal{X} \to \mathcal{X}$ based on the ellipsoid against hope (EAH) algorithm of Papadimitriou and Roughgarden (JACM '08). In particular, our algorithm for computing $\Phi$-equilibria is based on executing EAH in a nested fashion -- each step of EAH itself being implemented by invoking a separate call to EAH.
An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces
Terenin, Alexander, Negrea, Jeffrey
We develop an analysis of Thompson sampling for online learning under full feedback - also known as prediction with expert advice - where the learner's prior is defined over the space of an adversary's future actions, rather than the space of experts. We show regret decomposes into regret the learner expected a priori, plus a prior-robustness-type term we call excess regret. In the classical finite-expert setting, this recovers optimal rates. As an initial step towards practical online learning in settings with a potentially-uncountably-infinite number of experts, we show that Thompson sampling with a certain Gaussian process prior widely-used in the Bayesian optimization literature has a $\mathcal{O}(\beta\sqrt{T\log(1+\lambda)})$ rate against a $\beta$-bounded $\lambda$-Lipschitz adversary.
Intelligent Tutors Beyond K-12: An Observational Study of Adult Learner Engagement and Academic Impact
Gupta, Adit, MacLellan, Christopher
Intelligent tutors have proven to be effective in K-12 education, though their impact on adult learners -- especially as a supplementary resource -- remains underexplored. Understanding how adults voluntarily engage with educational technologies can inform the design of tools that support skill re-learning and enhancement. More critically, it helps determine whether tutoring systems, which are typically built for K-12 learners, can also support adult populations. This study examines the adoption, usage patterns, and effectiveness of a novel tutoring system, Apprentice Tutors, among adult learners at a state technical college. We analyze three types of data including, user demographics, grades, and tutor interactions, to assess whether voluntary tutor usage translates into measurable learning gains. Our findings reveal key temporal patterns in tutor engagement and provide evidence of learning within tutors, as determined through skill improvement in knowledge components across tutors. We also found evidence that this learning transferred outside the tutor, as observed through higher course assessment scores following tutor usage. These results suggest that intelligent tutors are a viable tool for adult learners, warranting further research into their long-term impact on this population.
Online Learning of Danger Avoidance for Complex Structures of Musculoskeletal Humanoids and Its Applications
Kawaharazuka, Kento, Hiraoka, Naoki, Koga, Yuya, Nishiura, Manabu, Omura, Yusuke, Asano, Yuki, Okada, Kei, Kawasaki, Koji, Inaba, Masayuki
-- The complex structure of musculoskeletal humanoids makes it difficult to model them, and the inter-body interference and high internal muscle force are unavoidable. Although various safety mechanisms have been developed to solve this problem, it is important not only to deal with the dangers when they occur but also to prevent them from happening. In this study, we propose a method to learn a network outputting danger probability corresponding to the muscle length online so that the robot can gradually prevent dangers from occurring. Applications of this network for control are also described. The method is applied to the musculoskeletal humanoid, Musashi, and its effectiveness is verified. I. INTRODUCTION The musculoskeletal humanoid [1]-[4] has various biomimetic advantages such as variable stiffness using redundant muscles, spherical joints without singular points, underactuated and flexible fingers, etc. At the same time, its complex musculoskeletal structure is difficult to model and various learning control methods have been developed [5]- [8].
Worst-case Error Bounds for Online Learning of Smooth Functions
Online learning is a model of machine learning where the learner is trained on sequential feedback. We investigate worst-case error for the online learning of real functions that have certain smoothness constraints. Suppose that $\mathcal{F}_q$ is the class of all absolutely continuous functions $f: [0, 1] \rightarrow \mathbb{R}$ such that $\|f'\|_q \le 1$, and $\operatorname{opt}_p(\mathcal{F}_q)$ is the best possible upper bound on the sum of the $p^{\text{th}}$ powers of absolute prediction errors for any number of trials guaranteed by any learner. We show that for any $\delta, \epsilon \in (0, 1)$, $\operatorname{opt}_{1+\delta} (\mathcal{F}_{1+\epsilon}) = O(\min(\delta, \epsilon)^{-1})$. Combined with the previous results of Kimber and Long (1995) and Geneson and Zhou (2023), we achieve a complete characterization of the values of $p, q \ge 1$ that result in $\operatorname{opt}_p(\mathcal{F}_q)$ being finite, a problem open for nearly 30 years. We study the learning scenarios of smooth functions that also belong to certain special families of functions, such as polynomials. We prove a conjecture by Geneson and Zhou (2023) that it is not any easier to learn a polynomial in $\mathcal{F}_q$ than it is to learn any general function in $\mathcal{F}_q$. We also define a noisy model for the online learning of smooth functions, where the learner may receive incorrect feedback up to $\eta \ge 1$ times, denoting the worst-case error bound as $\operatorname{opt}^{\text{nf}}_{p, \eta} (\mathcal{F}_q)$. We prove that $\operatorname{opt}^{\text{nf}}_{p, \eta} (\mathcal{F}_q)$ is finite if and only if $\operatorname{opt}_p(\mathcal{F}_q)$ is. Moreover, we prove for all $p, q \ge 2$ and $\eta \ge 1$ that $\operatorname{opt}^{\text{nf}}_{p, \eta} (\mathcal{F}_q) = \Theta (\eta)$.
Rapid Online Learning of Hip Exoskeleton Assistance Preferences
Ramella, Giulia, Ijspeert, Auke, Bouri, Mohamed
-- Hip exoskeletons are increasing in popularity due to their effectiveness across various scenarios and their ability to adapt to different users. However, personalizing the assistance often requires lengthy tuning procedures and computationally intensive algorithms, and most existing methods do not incorporate user feedback. In this work, we propose a novel approach for rapidly learning users' preferences for hip exoskeleton assistance. We perform pairwise comparisons of distinct randomly generated assistive profiles, and collect participants preferences through active querying. Users' feedback is integrated into a preference-learning algorithm that updates its belief, learns a user-dependent reward function, and changes the assistive torque profiles accordingly. Results from eight healthy subjects display distinct preferred torque profiles, and users' choices remain consistent when compared to a perturbed profile. A comprehensive evaluation of users' preferences reveals a close relationship with individual walking strategies. The tested torque profiles do not disrupt kinematic joint synergies, and participants favor assistive torques that are synchronized with their movements, resulting in lower negative power from the device. This straightforward approach enables the rapid learning of users preferences and rewards, grounding future studies on reward-based human-exoskeleton interaction.
Assessing a Single Student's Concentration on Learning Platforms: A Machine Learning-Enhanced EEG-Based Framework
Zhuo, Zewen, Najafi, Mohamad, Zein, Hazem, Nait-Ali, Amine
This study introduces a specialized pipeline designed to classify the concentration state of an individual student during online learning sessions by training a custom-tailored machine learning model. Detailed protocols for acquiring and preprocessing EEG data are outlined, along with the extraction of fifty statistical features from five EEG signal bands: alpha, beta, theta, delta, and gamma. Following feature extraction, a thorough feature selection process was conducted to optimize the data inputs for a personalized analysis. The study also explores the benefits of hyperparameter fine-tuning to enhance the classification accuracy of the student's concentration state. EEG signals were captured from the student using a Muse headband (Gen 2), equipped with five electrodes (TP9, AF7, AF8, TP10, and a reference electrode NZ), during engagement with educational content on computer-based e-learning platforms. Employing a random forest model customized to the student's data, we achieved remarkable classification performance, with test accuracies of 97.6% in the computer-based learning setting and 98% in the virtual reality setting. These results underscore the effectiveness of our approach in delivering personalized insights into student concentration during online educational activities.
No-regret incentive-compatible online learning under exact truthfulness with non-myopic experts
Komiyama, Junpei, Mehta, Nishant A., Mortazavi, Ali
We study an online forecasting setting in which, over $T$ rounds, $N$ strategic experts each report a forecast to a mechanism, the mechanism selects one forecast, and then the outcome is revealed. In any given round, each expert has a belief about the outcome, but the expert wishes to select its report so as to maximize the total number of times it is selected. The goal of the mechanism is to obtain low belief regret: the difference between its cumulative loss (based on its selected forecasts) and the cumulative loss of the best expert in hindsight (as measured by the experts' beliefs). We consider exactly truthful mechanisms for non-myopic experts, meaning that truthfully reporting its belief strictly maximizes the expert's subjective probability of being selected in any future round. Even in the full-information setting, it is an open problem to obtain the first no-regret exactly truthful mechanism in this setting. We develop the first no-regret mechanism for this setting via an online extension of the Independent-Event Lotteries Forecasting Competition Mechanism (I-ELF). By viewing this online I-ELF as a novel instance of Follow the Perturbed Leader (FPL) with noise based on random walks with loss-dependent perturbations, we obtain $\tilde{O}(\sqrt{T N})$ regret. Our results are fueled by new tail bounds for Poisson binomial random variables that we develop. We extend our results to the bandit setting, where we give an exactly truthful mechanism obtaining $\tilde{O}(T^{2/3} N^{1/3})$ regret; this is the first no-regret result even among approximately truthful mechanisms.
New Rates in Stochastic Decision-Theoretic Online Learning under Differential Privacy
Hu and Mehta (2024) posed an open problem: what is the optimal instance-dependent rate for the stochastic decision-theoretic online learning (with $K$ actions and $T$ rounds) under $\varepsilon$-differential privacy? Before, the best known upper bound and lower bound are $O\left(\frac{\log K}{\Delta_{\min}} + \frac{\log K\log T}{\varepsilon}\right)$ and $\Omega\left(\frac{\log K}{\Delta_{\min}} + \frac{\log K}{\varepsilon}\right)$ (where $\Delta_{\min}$ is the gap between the optimal and the second actions). In this paper, we partially address this open problem by having two new results. First, we provide an improved upper bound for this problem $O\left(\frac{\log K}{\Delta_{\min}} + \frac{\log^2K}{\varepsilon}\right)$, where the $T$-dependency has been removed. Second, we introduce the deterministic setting, a weaker setting of this open problem, where the received loss vector is deterministic and we can focus on the analysis for $\varepsilon$ regardless of the sampling error. At the deterministic setting, we prove upper and lower bounds that match at $\Theta\left(\frac{\log K}{\varepsilon}\right)$, while a direct application of the analysis and algorithms from the original setting still leads to an extra log factor. Technically, we introduce the Bernoulli resampling trick, which enforces a monotonic property for the output from report-noisy-max mechanism that enables a tighter analysis. Moreover, by replacing the Laplace noise with Gumbel noise, we derived explicit integral form that gives a tight characterization of the regret in the deterministic case.
Provable and Practical Online Learning Rate Adaptation with Hypergradient Descent
Chu, Ya-Chi, Gao, Wenzhi, Ye, Yinyu, Udell, Madeleine
This paper investigates the convergence properties of the hypergradient descent method (HDM), a 25-year-old heuristic originally proposed for adaptive stepsize selection in stochastic first-order methods. We provide the first rigorous convergence analysis of HDM using the online learning framework of [Gao24] and apply this analysis to develop new state-of-the-art adaptive gradient methods with empirical and theoretical support. Notably, HDM automatically identifies the optimal stepsize for the local optimization landscape and achieves local superlinear convergence. Our analysis explains the instability of HDM reported in the literature and proposes efficient strategies to address it. We also develop two HDM variants with heavy-ball and Nesterov momentum. Experiments on deterministic convex problems show HDM with heavy-ball momentum (HDM-HB) exhibits robust performance and significantly outperforms other adaptive first-order methods. Moreover, HDM-HB often matches the performance of L-BFGS, an efficient and practical quasi-Newton method, using less memory and cheaper iterations.