Appendix

Neural Information Processing Systems 

Problem with selecting oracles based on initial value. Alternatively, we can switch between the oracles once to get a reward of 3/4, and twice to get the optimal reward of 1. All terminal states not shown give a reward of 0 and intermediate states have no rewards. The optimal terminal state is outlined in bold. Consequently it goes right and eventually obtains a suboptimal reward of 3/4.

Duplicate Docs Excel Report

Similar Docs  Excel Report  more

TitleSimilaritySource
None found