A Contextual Bandit Approach for Learning to Plan in Environments with Probabilistic Goal Configurations