Randomized Policy Learning for Continuous State and Action MDPs