Provably Safe Reinforcement Learning with Step-wise Violation Constraints