Examining average and discounted reward optimality criteria in reinforcement learning

Open in new window