Provably Correct Optimization and Exploration with Non-linear Policies