Agents
Off-PolicyEvaluationforAction-Dependent Non-StationaryEnvironments
Methods for sequential decision making are often built upon a foundational assumption that the underlying decision process is stationary [Sutton and Barto, 2018]. While this assumption was a cornerstone when laying the theoretical foundations of the field, and while is often reasonable, it isseldom trueinpractice andcanbeunreasonable [Dulac-Arnold etal.,2019].