Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning

Jun-14-2026, 01:28:18 GMT–Neural Information Processing Systems

Motivated by real-world settings where data collection and policy deployment--whether for a single agent or across multiple agents--are costly, we study the problem of on-policy single-agent reinforcement learning (RL) and federated RL (FRL) with a focus on minimizing burn-in costs (the sample sizes needed to reach near-optimal regret) and policy switching or communication costs. In parallel finite-horizon episodic Markov Decision Processes (MDPs) with $S$ states and $A$ actions, existing methods either require superlinear burn-in costs in $S$ and $A$ or fail to achieve logarithmic switching or communication costs.

artificial intelligence, machine learning, reinforcement learning, (7 more...)

Neural Information Processing Systems

Jun-14-2026, 01:28:18 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)