On the Theory of Reinforcement Learning with Once-per-Episode Feedback

Neural Information Processing Systems 

This formulation of RL has had significant empirical success in the recent past [24, 23, 33, 32].