Discrete-to-Deep Supervised Policy Learning