Reinforcement Learning under Drift

Cheung, Wang Chi, Simchi-Levi, David, Zhu, Ruihao

Jun-7-2019–arXiv.org Machine Learning

Consider a discrete-time Markovian decision process (MDP) where a decision-maker (DM) interacts with a system iteratively: in each round, the DM first observes the current state of the system, and then picks an available action. Afterwards, it receives an instant random reward, and the system transits to the next state according to some state transition distribution. The reward distribution and the state transition distribution depend on the current state and the chosen action, but are independent of all the previous states and actions. The goal of the DM is to maximize its cumulative rewards under the following challenges: - Uncertainty: the reward and the state transition distributions are initially unknown to the DM. - Non-stationarity: the environment is non-stationary, and both of the reward distributions and the state transition distributions can evolve over time.

data mining, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

Jun-7-2019

arXiv.org PDF

Add feedback

Country:
- Asia > Singapore (0.04)
- North America
  - United States
    - Massachusetts > Middlesex County
      - Cambridge (0.14)
    - California > Santa Clara County
      - Palo Alto (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.46)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning > Reinforcement Learning (0.51)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found