AITopics | dynamic regret

Collaborating Authors

dynamic regret

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Gradient-Variation Regret Bounds for Unconstrained Online Learning

Zhao, Yuheng, Jacobsen, Andrew, Cesa-Bianchi, Nicolò, Zhao, Peng

arXiv.org Machine LearningApr-14-2026

We develop parameter-free algorithms for unconstrained online learning with regret guarantees that scale with the gradient variation $V_T(u) = \sum_{t=2}^T \|\nabla f_t(u)-\nabla f_{t-1}(u)\|^2$. For $L$-smooth convex loss, we provide fully-adaptive algorithms achieving regret of order $\widetilde{O}(\|u\|\sqrt{V_T(u)} + L\|u\|^2+G^4)$ without requiring prior knowledge of comparator norm $\|u\|$, Lipschitz constant $G$, or smoothness $L$. The update in each round can be computed efficiently via a closed-form expression. Our results extend to dynamic regret and find immediate implications to the stochastically-extended adversarial (SEA) model, which significantly improves upon the previous best-known result [Wang et al., 2025].

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2604.11151

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting > Online (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.62)

Add feedback

Parameter-Free Dynamic Regret for Unconstrained Linear Bandits

Rumi, Alberto, Jacobsen, Andrew, Cesa-Bianchi, Nicolò, Vitale, Fabio

arXiv.org Machine LearningMar-30-2026

We study dynamic regret minimization in unconstrained adversarial linear bandit problems. In this setting, a learner must minimize the cumulative loss relative to an arbitrary sequence of comparators $\boldsymbol{u}_1,\ldots,\boldsymbol{u}_T$ in $\mathbb{R}^d$, but receives only point-evaluation feedback on each round. We provide a simple approach to combining the guarantees of several bandit algorithms, allowing us to optimally adapt to the number of switches $S_T = \sum_t\mathbb{I}\{\boldsymbol{u}_t \neq \boldsymbol{u}_{t-1}\}$ of an arbitrary comparator sequence. In particular, we provide the first algorithm for linear bandits achieving the optimal regret guarantee of order $\mathcal{O}\big(\sqrt{d(1+S_T) T}\big)$ up to poly-logarithmic terms without prior knowledge of $S_T$, thus resolving a long-standing open problem.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

2603.25916

Country:

North America > United States (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Africa > Middle East > Morocco > Tanger-Tetouan-Al Hoceima Region > Tangier (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.55)

Add feedback

Adaptive Online Learning in Dynamic Environments

Neural Information Processing SystemsMar-16-2026, 17:59:40 GMT

In this paper, we study online convex optimization in dynamic environments, and aim to bound the dynamic regret with respect to any sequence of comparators. Existing work have shown that online gradient descent enjoys an $O(\sqrt{T}(1+P_T))$ dynamic regret, where $T$ is the number of iterations and $P_T$ is the path-length of the comparator sequence. However, this result is unsatisfactory, as there exists a large gap from the $\Omega(\sqrt{T(1+P_T)})$ lower bound established in our paper. To address this limitation, we develop a novel online method, namely adaptive learning for dynamic environment (Ader), which achieves an optimal $O(\sqrt{T(1+P_T)})$ dynamic regret. The basic idea is to maintain a set of experts, each attaining an optimal dynamic regret for a specific path-length, and combines them with an expert-tracking algorithm. Furthermore, we propose an improved Ader based on the surrogate loss, and in this way the number of gradient evaluations per round is reduced from $O(\log T)$ to $1$. Finally, we extend Ader to the setting that a sequence of dynamical models is available to characterize the comparators.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

MetaCURL: Non-stationary Concave Utility Reinforcement Learning Bianca Marin Moreno Inria

Neural Information Processing SystemsFeb-18-2026, 10:00:32 GMT

We explore online learning in episodic Markov decision processes on non-stationary environments (changing losses and probability transitions).

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education > Educational Setting > Online (0.48)
Energy > Power Industry (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Unconstrained Dynamic Regret via Sparse Coding

Neural Information Processing SystemsFeb-17-2026, 19:21:24 GMT

Nonstationarity is prevalent in sequential decision making, which poses a critical challenge to the vast majority of existing approaches developed offline.

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.94)

Add feedback

b304a04c6a4ac168aa692923814ace17-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 14:16:27 GMT

Proceedings of the International Conference on Machine Learning 2024

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Dynamic Regret of Adversarial Linear Mixture MDPs

Neural Information Processing SystemsFeb-16-2026, 21:18:10 GMT

We study reinforcement learning in episodic inhomogeneous MDPs with adversarial full-information rewards and the unknown transition kernel. We consider the linear mixture MDPs whose transition kernel is a linear mixture model and choose the dynamic regret as the performance measure.

dynamic regret, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

8eed150084dc3534f01ba63f9b7d32d2-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 14:16:59 GMT

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Asia > China (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions

Neural Information Processing SystemsFeb-15-2026, 20:09:47 GMT

A notable example that motivates our work is the problem of minimizing costs (or maximizing rewards) in a single-trajectory Markov Decision Process (MDP).

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Transportation > Air (0.43)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs

Neural Information Processing SystemsFeb-15-2026, 11:55:51 GMT

The interaction is usually modeled as Markov Decision Processes (MDPs). Research on MDPs can be broadly divided into two lines based on the reward generation mechanism. The first line of work [Jaksch et al., 2010, Azar et al., 2013, 2017, He et al., 2021] considers the

dynamic regret, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: