Pathological Effects of Variance on Classification-Based Policy Iteration

Pires, Bernardo Ávila (University of Alberta) | Szepesvári, Csaba (University of Alberta)

Mar-1-2015–AAAI Conferences

We carry out an empirical study of classification-based policy iteration (CBPI) in a simplified Markovian Decision Process (MDP). In this simple MDP, we expose some pathological cases where variance in state-action value estimates can degrade the performance of CBPI to the point of complete ineffectiveness. In particular, it is shown that with enough variance in the returns, e.g., if we estimate state-action values with a single rollout, CBPI drifts away from the/an optimal policy over iterations, even when the optimal policy is its initial policy to iterate over. From our investigation we also arrived at a natural cost-sensitive classification problem where the costs are noisy, a problem which to the best of our knowledge has not been studied in the classification literature.

cbpi, iteration, variance, (14 more...)

AAAI Conferences

Mar-1-2015

Conferences PDF

Add feedback

Country:
- North America
  - Canada > Alberta (0.14)
  - United States > New York
    - New York County > New York City (0.04)
- Asia > Middle East
  - Israel > Haifa District > Haifa (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found