Finding the Near Optimal Policy via Adaptive Reduced Regularization in MDPs