Appendix: OnlineLearninginContextualBandits usingGatedLinearNetworks

Neural Information Processing Systems 

Weassume that our tree divides the bounded reward range[rmin,rmax] uniformly into2d bins at each leveld D. By labelling left branches ofanode by0,and right branches with a1,we can associate aunique binary stringb1:d to any single internal (d < D) or leaf (d = D) node in the tree. Thedth element, when it exists, is denoted asbd. The root node is denoted by empty string . We should note that even though this exponential term might initially seem discouraging, we setD = 3in our experiments and observe no significant improvements for largerD. Algorithm 1 CTREE, performs regression utilizing a tree-based discetization, where nodes are composedofGLNs.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found