Training Deep Reactive Policies for Probabilistic Planning Problems