Nonstationary Reinforcement Learning with Linear Function Approximation

Zhou, Huozhi, Chen, Jinglin, Varshney, Lav R., Jagmohan, Ashish

Oct-15-2020–arXiv.org Machine Learning

We consider reinforcement learning (RL) in episodic Markov decision processes (MDPs) with linear function approximation under drifting environment. Specifically, both the reward and state transition functions can evolve over time, as long as their respective total variations, quantified by suitable metrics, do not exceed certain \textit{variation budgets}. We first develop the $\texttt{LSVI-UCB-Restart}$ algorithm, an optimistic modification of least-squares value iteration combined with periodic restart, and establish its dynamic regret bound when variation budgets are known. We then propose a parameter-free algorithm, $\texttt{Ada-LSVI-UCB-Restart}$, that works without knowing the variation budgets, but with a slightly worse dynamic regret bound. We also derive the first minimax dynamic regret lower bound for nonstationary MDPs to show that our proposed algorithms are near-optimal. As a byproduct, we establish a minimax regret lower bound for linear MDPs, which is unsolved by \cite{jin2020provably}. In addition, we provide numerical experiments to demonstrate the effectiveness of our proposed algorithms. As far as we know, this is the first dynamic regret analysis in nonstationary reinforcement learning with function approximation.

dynamic regret, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

Oct-15-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New Jersey > Hudson County
    - Hoboken (0.04)
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
  - Illinois > Champaign County
    - Urbana (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.14)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.50)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.58)
- Information Technology (0.46)
- Leisure & Entertainment > Games (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty
    - Fuzzy Logic (0.81)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found