Near-Optimal Regret Bounds for Model-Free RL in Non-Stationary Episodic MDPs

Mao, Weichao, Zhang, Kaiqing, Zhu, Ruihao, Simchi-Levi, David, Başar, Tamer

Oct-7-2020–arXiv.org Artificial Intelligence

Reinforcement learning (RL) studies the class of problems where an agent maximizes its cumulative reward through sequential interaction with an unknown but fixed environment, usually modeled by a Markov Decision Process (MDP). At each time step, the agent takes an action, receives a random reward drawn from a reward function, and then the environment transitions to a new state according to an unknown transition kernel. In classical RL problems, the transition kernel and the reward functions are assumed to be time-invariant. This stationary model, however, cannot capture the phenomenon that in many real-world decision-making problems, the environment, including both the transition dynamics and the reward functions, is inherently evolving over time. Non-stationarity exists in a wide range of applications, including online advertisement auctions (Cai et al., 2017; Lu et al., 2019), dynamic pricing (Board, 2008; Chawla et al., 2016), traffic management (Chen et al., 2020), healthcare operations (Shortreed et al., 2011), and inventory control (Agrawal & Jia, 2019). Among the many intriguing applications, we specifically emphasize two research areas that can significantly benefit from progress on non-stationary RL, yet their connections have been largely overlooked in the literature. The first one is sequential transfer in RL (Tirinzoni et al., 2020) or multitask RL Brunskill & Li (2013).

algorithm, artificial intelligence, health & medicine, (17 more...)

arXiv.org Artificial Intelligence

Oct-7-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States > Massachusetts (0.14)

Genre:
- Research Report (0.64)
- Workflow (0.92)

Industry:
- Health & Medicine (0.49)
- Marketing (0.34)
- Transportation (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.70)
    - Reinforcement Learning (1.00)
  - Representation & Reasoning > Agents (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found