Finite-time Convergence Analysis of Actor-Critic with Evolving Reward

Oct-15-2025–arXiv.org Artificial Intelligence

Many popular practical reinforcement learning (RL) algorithms employ evolving reward functions-through techniques such as reward shaping, entropy regularization, or curriculum learning-yet their theoretical foundations remain underdeveloped. This paper provides the first finite-time convergence analysis of a single-timescale actor-critic algorithm in the presence of an evolving reward function under Markovian sampling. We consider a setting where the reward parameters may change at each time step, affecting both policy optimization and value estimation. Under standard assumptions, we derive non-asymptotic bounds for both actor and critic errors. Our result shows that an $O(1/\sqrt{T})$ convergence rate is achievable, matching the best-known rate for static rewards, provided the reward parameters evolve slowly enough. This rate is preserved when the reward is updated via a gradient-based rule with bounded gradient and on the same timescale as the actor and critic, offering a theoretical foundation for many popular RL techniques. As a secondary contribution, we introduce a novel analysis of distribution mismatch under Markovian sampling, improving the best-known rate by a factor of $\log^2T$ in the static-reward case.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

Oct-15-2025

arXiv.org PDF

Add feedback

Country:
- Oceania
  - Palau (0.04)
  - Australia > New South Wales
    - Sydney (0.04)
- North America
  - United States
    - District of Columbia > Washington (0.04)
    - New York > New York County
      - New York City (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Illinois > Cook County
      - Chicago (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - Colorado > Denver County
      - Denver (0.04)
    - California
      - San Francisco County > San Francisco (0.14)
      - Los Angeles County > Long Beach (0.04)
  - Puerto Rico > San Juan
    - San Juan (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)
- Europe
  - Austria > Vienna (0.14)
  - Czechia > Prague (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Spain > Valencian Community
    - Valencia Province > Valencia (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > New Finding (0.54)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found