First-order Sobolev Reinforcement Learning