Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes