Optimal Strong Regret and Violation in Constrained MDPs via Policy Optimization

Open in new window