Zeroth-Order Supervised Policy Improvement