Learning Unknown Markov Decision Processes: A Thompson Sampling Approach