Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling
Liu, Yao, Bacon, Pierre-Luc, Brunskill, Emma
Due in part to the growing sources of data about past sequences of decisions and their outcomes - from marketing to energy management to healthcare - there is increasing interest in developing accurate and efficient algorithms for off-policy policy evaluation. For Markov Decision Processes, this problem was addressed (Precup et al., 2000) early on by importance sampling (IS)(Rubinstein, 1981), a method prone to large variance due to rare events (Glynn, 1994; L'Ecuyer et al., 2009). The per-decision importance sampling estimator of Precup et al. (2000) tries to mitigate this problem by leveraging the temporal structure - earlier rewards cannot depend on later decisions - of the domain. While neither importance sampling (IS) nor per-decision IS (PDIS) assumes the underlying domain is Markov, more recently, a new class of estimators (Hallak and Mannor, 2017; Liu et al., 2018; Gelada and Bellemare, 2019) has been proposed that leverages the Markovian structure. In particular, these approaches propose performing importance sampling over the stationary state-action distributions induced by the corresponding Markov chain for a particular policy. By avoiding the explicit accumulation of likelihood ratios along the trajectories, it is hypothesized that such ratios of stationary distributions could substantially reduce the variance of the resulting estimator, thereby overcoming the "curse of horizon" (Liu et al., 2018) plaguing off-policy evaluation. The recent flurry of empirical results shows significant performance improvements over the alternative methods on a variety of simulation domains. Yet so far there has not been a formal analysis of the accuracy of IS, PDIS, and stationary state-action IS which will strengthen our understanding of their properties, benefits and limitations.
Oct-14-2019
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America
- Canada (0.04)
- United States
- New York > New York County
- New York City (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- California
- New York > New York County
- Asia > Japan
- Oceania > Australia
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Energy (0.35)
- Technology: