Off-Policy Evaluation for Action-Dependent Non-Stationary Environments

Chandak, Yash, Shankar, Shiv, Bastian, Nathaniel D., da Silva, Bruno Castro, Brunskil, Emma, Thomas, Philip S.

Jan-24-2023–arXiv.org Artificial Intelligence

Methods for sequential decision-making are often built upon a foundational assumption that the underlying decision process is stationary. This limits the application of such methods because real-world problems are often subject to changes due to external factors (passive non-stationarity), changes induced by interactions with the system itself (active non-stationarity), or both (hybrid non-stationarity). In this work, we take the first steps towards the fundamental challenge of on-policy and off-policy evaluation amidst structured changes due to active, passive, or hybrid non-stationarity. Towards this goal, we make a higher-order stationarity assumption such that non-stationarity results in changes over time, but the way changes happen is fixed. We propose, OPEN, an algorithm that uses a double application of counterfactual reasoning and a novel importance-weighted instrument-variable regression to obtain both a lower bias and a lower variance estimate of the structure in the changes of a policy's past performances. Finally, we show promising results on how OPEN can be used to predict future performances for several domains inspired by real-world applications that exhibit non-stationarity.

arxiv preprint arxiv, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

Jan-24-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America > United States
  - Massachusetts > Middlesex County > Cambridge (0.04)
- Europe
  - Poland (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report
  - New Finding (0.67)
  - Experimental Study (0.45)

Industry:
- Health & Medicine > Government Relations & Public Policy (0.67)
- Education (0.67)
- Government
  - Military (0.92)
  - Regional Government > North America Government
    - United States Government > FDA (0.46)

Technology:
- Information Technology
  - Data Science (0.93)
  - Artificial Intelligence
    - Representation & Reasoning > Agents (1.00)
    - Machine Learning
      - Reinforcement Learning (1.00)
      - Learning Graphical Models > Undirected Networks
        Markov Models (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found