Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation