Finite-time Convergence Analysis of Actor-Critic with Evolving Reward
Hu, Rui, Chen, Yu, Huang, Longbo
–arXiv.org Artificial Intelligence
Many popular practical reinforcement learning (RL) algorithms employ evolving reward functions-through techniques such as reward shaping, entropy regularization, or curriculum learning-yet their theoretical foundations remain underdeveloped. This paper provides the first finite-time convergence analysis of a single-timescale actor-critic algorithm in the presence of an evolving reward function under Markovian sampling. We consider a setting where the reward parameters may change at each time step, affecting both policy optimization and value estimation. Under standard assumptions, we derive non-asymptotic bounds for both actor and critic errors. Our result shows that an $O(1/\sqrt{T})$ convergence rate is achievable, matching the best-known rate for static rewards, provided the reward parameters evolve slowly enough. This rate is preserved when the reward is updated via a gradient-based rule with bounded gradient and on the same timescale as the actor and critic, offering a theoretical foundation for many popular RL techniques. As a secondary contribution, we introduce a novel analysis of distribution mismatch under Markovian sampling, improving the best-known rate by a factor of $\log^2T$ in the static-reward case.
arXiv.org Artificial Intelligence
Oct-15-2025
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe
- North America
- Canada > British Columbia
- Puerto Rico > San Juan
- San Juan (0.04)
- United States
- California
- Los Angeles County > Long Beach (0.04)
- San Francisco County > San Francisco (0.14)
- Colorado > Denver County
- Denver (0.04)
- District of Columbia > Washington (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- New York > New York County
- New York City (0.04)
- California
- Oceania
- Australia > New South Wales
- Sydney (0.04)
- Palau (0.04)
- Australia > New South Wales
- Asia > Middle East
- Genre:
- Research Report > New Finding (0.54)
- Technology: