Towards Safe Policy Improvement for Non-Stationary MDPs

Chandak, Yash, Jordan, Scott M., Theocharous, Georgios, White, Martha, Thomas, Philip S.

Oct-23-2020–arXiv.org Artificial Intelligence

Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks. While several works in the past have proposed methods that are safe for deployment, they assume that the underlying problem is stationary. However, many real-world problems of interest exhibit non-stationarity, and when stakes are high, the cost associated with a false stationarity assumption may be unacceptable. We take the first steps towards ensuring safety, with high confidence, for smoothly-varying non-stationary decision problems. Our proposed method extends a type of safe algorithm, called a Seldonian algorithm, through a synthesis of model-free reinforcement learning with time-series analysis. Safety is ensured using sequential hypothesis testing of a policy's forecasted performance, and confidence intervals are obtained using wild bootstrap.

algorithm, diabetes, us government, (20 more...)

arXiv.org Artificial Intelligence

Oct-23-2020

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada (0.67)
  - United States (1.00)

Genre:
- Research Report > Experimental Study (0.46)

Industry:
- Energy (0.67)
- Government > Regional Government
  - North America Government > United States Government > FDA (0.46)
- Health & Medicine > Therapeutic Area
  - Endocrinology > Diabetes (1.00)
- Information Technology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Statistical Learning (0.88)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found