Exterior Penalty Policy Optimization with Penalty Metric Network under Constraints