a3bfb116214815682a0d0d88ea95cd12-Paper-Conference.pdf

Neural Information Processing Systems 

A key question in reinforcement learning is how to use value-function approximation to arrive at scaleable algorithms that can find near-optimal policies in Markov decision processes (MDPs).