Supplementary Material AT ask Details

Neural Information Processing Systems 

There is a total of 14 tasks, out of which 10 are prediction and 4 are bandit tasks. A prediction task proceeds as follows. The interaction protocol for bandit tasks is as follows. The agent's return is the discounted sum of rewards Our Bayes-optimal agents act and predict according to the standard models in the literature. For a full list of update and prediction rules, see Table 1.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found