Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result

Paul Wagner

Neural Information Processing Systems 

Neural Information Processing Systems http://nips.cc/