Safe Reinforcement Learning with Minimal Supervision