Maximizing the Probability of Arriving on Time: A Practical Q-Learning Method