Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds

Jan-19-2025, 19:32:08 GMT–Neural Information Processing Systems

While numerous works have focused on devising efficient algorithms for reinforcement learning (RL) with uniformly bounded rewards, it remains an open question whether sample or time-efficient algorithms for RL with large state-action space exist when the rewards are \emph{heavy-tailed}, i.e., with only finite (1 \epsilon) -th moments for some \epsilon\in(0,1] . In this work, we address the challenge of such rewards in RL with linear function approximation. Here, d is the feature dimension, and u_t {1 \epsilon} is the (1 \epsilon) -th central moment of the reward at the t -th round. We further show the above bound is minimax optimal when applied to the worst-case instances in stochastic and deterministic linear bandits. We then extend this algorithm to the RL settings with linear function approximation.

algorithm, optimal and instance-dependent regret bound, tackling heavy-tailed reward, (9 more...)

Neural Information Processing Systems

Jan-19-2025, 19:32:08 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (0.63)
  - Representation & Reasoning
    - Uncertainty > Fuzzy Logic (0.86)
    - Search (0.66)