Optimistic Initialization and Greediness Lead to Polynomial Time Learning in Factored MDPs - Extended Version

Open in new window