Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees