Methods

Neural Information Processing Systems 

We used these rewards to update the network's outputs as follows.