Optimal policy evaluation using kernel-based temporal difference methods

Open in new window