Deterministic Policies for Constrained Reinforcement Learning in Polynomial Time