Goto

Collaborating Authors

 Learning Management


A Regret-Variance Trade-Off in Online Learning

Neural Information Processing Systems

We consider prediction with expert advice for strongly convex and bounded losses, and investigate trade-offs between regret and variance'' (i.e., squared difference of learner's predictions and best expert predictions).With K experts, the Exponentially Weighted Average (EWA) algorithm is known to achieve O(\log K) regret.We prove that a variant of EWA either achieves a \textsl{negative} regret (i.e., the algorithm outperforms the best expert), or guarantees a O(\log K) bound on \textsl{both} variance and regret.Building on this result, we show several examples of how variance of predictions can be exploited in learning.In the online to batch analysis, we show that a large empirical variance allows to stop the online to batch conversion early and outperform the risk of the best predictor in the class. We also recover the optimal rate of model selection aggregation when we do not consider early stopping.In online prediction with corrupted losses, we show that the effect of corruption on the regret can be compensated by a large variance.In online selective sampling, we design an algorithm that samples less when the variance is large, while guaranteeing the optimal regret bound in expectation.In online learning with abstention, we use a similar term as the variance to derive the first high-probability O(\log K) regret bound in this setting.Finally, we extend our results to the setting of online linear regression.


A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs

Neural Information Processing Systems

We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally-efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is \tilde{O} (\sqrt{\alpha T}), where T is the time horizon and \alpha is the independence number of the feedback graph. The bound against stochastic environments is O\big((\ln T) 2 \max_{S\in \mathcal I(G)} \sum_{i \in S} \Delta_i {-1}\big) where \mathcal I(G) is the family of all independent sets in a suitably defined undirected version of the graph and \Delta_i are the suboptimality gaps.The algorithm combines ideas from the EXP3 algorithm for stochastic and adversarial bandits and the EXP3.G algorithm for feedback graphs with a novel exploration scheme. The scheme, which exploits the structure of the graph to reduce exploration, is key to obtain best-of-both-worlds guarantees with feedback graphs.


Learning on the Edge: Online Learning with Stochastic Feedback Graphs

Neural Information Processing Systems

The framework of feedback graphs is a generalization of sequential decision-making with bandit or full information feedback. In this work, we study an extension where the directed feedback graph is stochastic, following a distribution similar to the classical Erdล‘s-Rรฉnyi model. Specifically, in each round every edge in the graph is either realized or not with a distinct probability for each edge. Our result, which holds without any preliminary knowledge about \mathcal{G}, requires the learner to observe only the realized out-neighborhood of the chosen action. When the learner is allowed to observe the realization of the entire graph (but only the losses in the out-neighborhood of the chosen action), we derive a more efficient algorithm featuring a dependence on weighted versions of the independence and weak domination numbers that exhibits improved bounds for some special cases.


Best-case lower bounds in online learning

Neural Information Processing Systems

Much of the work in online learning focuses on the study of sublinear upper bounds on the regret. In this work, we initiate the study of best-case lower bounds in online convex optimization, wherein we bound the largest \emph{improvement} an algorithm can obtain relative to the single best action in hindsight. This problem is motivated by the goal of better understanding the adaptivity of a learning algorithm. Another motivation comes from fairness: it is known that best-case lower bounds are instrumental in obtaining algorithms for decision-theoretic online learning (DTOL) that satisfy a notion of group fairness. Our contributions are a general method to provide best-case lower bounds in Follow The Regularized Leader (FTRL) algorithms with time-varying regularizers, which we use to show that best-case lower bounds are of the same order as existing upper regret bounds: this includes situations with a fixed learning rate, decreasing learning rates, timeless methods, and adaptive gradient methods.


Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs

Neural Information Processing Systems

This study considers online learning with general directed feedback graphs. For this problem, we present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments as well as poly-logarithmic regret bounds for stochastic environments. As Alon et al. [2015] have shown, tight regret bounds depend on the structure of the feedback graph: strongly observable graphs yield minimax regret of \tilde{\Theta}( \alpha {1/2} T {1/2}), while weakly observable graphs induce minimax regret of \tilde{\Theta}( \delta {1/3} T {2/3}), where \alpha and \delta, respectively, represent the independence number of the graph and the domination number of a certain portion of the graph. This result resolves an open question raised by Erez and Koren [2021]. We also provide an algorithm for weakly observable graphs that achieves a regret bound of \tilde{O}( \delta {1/3}T {2/3}) for adversarial environments and poly-logarithmic regret for stochastic environments.


Optimal Comparator Adaptive Online Learning with Switching Cost

Neural Information Processing Systems

Practical online learning tasks are often naturally defined on unconstrained domains, where optimal algorithms for general convex losses are characterized by the notion of comparator adaptivity. In this paper, we design such algorithms in the presence of switching cost - the latter penalizes the typical optimism in adaptive algorithms, leading to a delicate design trade-off. Based on a novel dual space scaling strategy discovered by a continuous-time analysis, we propose a simple algorithm that improves the existing comparator adaptive regret bound [ZCP22a] to the optimal rate. The obtained benefits are further extended to the expert setting, and the practicality of the proposed algorithm is demonstrated through a sequential investment task.


Task-Free Continual Learning via Online Discrepancy Distance Learning

Neural Information Processing Systems

Learning from non-stationary data streams, also called Task-Free Continual Learning (TFCL) remains challenging due to the absence of explicit task information in most applications. Even though recently some algorithms have been proposed for TFCL, these methods lack theoretical guarantees. Moreover, there are no theoretical studies about forgetting during TFCL. This paper develops a new theoretical analysis framework that derives generalization bounds based on the discrepancy distance between the visited samples and the entire information made available for training the model. This analysis provides new insights into the forgetting behaviour in classification tasks.


Self-Explanation in Social AI Agents

arXiv.org Artificial Intelligence

For example, in online learning, an AI social assistant may connect learners and thereby enhance social interaction. These social AI assistants too need to explain themselves in order to enhance transparency and trust with the learners. We present a method of self-explanation that uses introspection over a self-model of an AI social assistant. The self-model is captured as a functional model that specifies how the methods of the agent use knowledge to achieve its tasks. The process of generating self-explanations uses Chain of Thought to reflect on the self-model and ChatGPT to provide explanations about its functioning. We evaluate the self-explanation of the AI social assistant for completeness and correctness. We also report on its deployment in a live class.


Delay-Tolerant Algorithms for Asynchronous Distributed Online Learning

Neural Information Processing Systems

We analyze new online gradient descent algorithms for distributed systems with large delays between gradient computations and the corresponding updates. Using insights from adaptive gradient methods, we develop algorithms that adapt not only to the sequence of gradients, but also to the precise update delays that occur. We first give an impractical algorithm that achieves a regret bound that precisely quantifies the impact of the delays. We then analyze AdaptiveRevision, an algorithm that is efficiently implementable and achieves comparable guarantees. The key algorithmic technique is appropriately and efficiently revising the learning rate used for previous gradient steps.


Non-convex online learning via algorithmic equivalence

Neural Information Processing Systems

We study an algorithmic equivalence technique between non-convex gradient descent and convex mirror descent. We start by looking at a harder problem of regret minimization in online non-convex optimization. We show that under certain geometric and smoothness conditions, online gradient descent applied to non-convex functions is an approximation of online mirror descent applied to convex functions under reparameterization. In continuous time, the gradient flow with this reparameterization was shown to be \emph{exactly} equivalent to continuous-time mirror descent by Amid and Warmuth, but theory for the analogous discrete time algorithms is left as an open problem. We prove an O(T {\frac{2}{3}}) regret bound for non-convex online gradient descent in this setting, answering this open problem.