Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result

Open in new window