Optimistic Initialization and Greediness Lead to Polynomial Time Learning in Factored MDPs - Extended Version