Methods To study the ways of how real-world agents may learn to make sequential decisions, we built models

Neural Information Processing Systems 

We used these rewards to update the network's outputs as follows.