Learning-Based Mean-Payoff Optimization in an Unknown MDP under Omega-Regular Constraints

Open in new window