Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result