Finding the Near Optimal Policy via Adaptive Reduced Regularization in MDPs

Open in new window