Pathological Effects of Variance on Classification-Based Policy Iteration
Pires, Bernardo Ávila (University of Alberta) | Szepesvári, Csaba (University of Alberta)
We carry out an empirical study of classification-based policy iteration (CBPI) in a simplified Markovian Decision Process (MDP). In this simple MDP, we expose some pathological cases where variance in state-action value estimates can degrade the performance of CBPI to the point of complete ineffectiveness. In particular, it is shown that with enough variance in the returns, e.g., if we estimate state-action values with a single rollout, CBPI drifts away from the/an optimal policy over iterations, even when the optimal policy is its initial policy to iterate over. From our investigation we also arrived at a natural cost-sensitive classification problem where the costs are noisy, a problem which to the best of our knowledge has not been studied in the classification literature.
- Country:
- North America
- Canada > Alberta (0.14)
- United States > New York
- New York County > New York City (0.04)
- Asia > Middle East
- Israel > Haifa District > Haifa (0.04)
- North America
- Technology: