Total stochastic gradient algorithms and applications in reinforcement learning