Contents of Appendix A Extended Literature Review 14 B Time Uniform Lasso Analysis 15 C Results on Exploration 18 C.1 ALE 20 C.2 Proof of Results on Exploration 20 D Proof of Regret Bound

Neural Information Processing Systems 

We present the bounds in terms of d and M for coherence with the rest of the text, assuming that M = O(p), which is the case when d p. Table 2 compares recent work on sparse linear bandits based on a number of important factors. The regret bounds in Table 2 are simplified to the terms with largest rate of growth, the reader should check the corresponding papers for rigorous results. Some of the mentioned bounds depend on problem-dependent parameters (e.g. To indicate such parameters we use in Table 2, following the notation of Hao et al. [2020]. Note that varies across the rows of the table, and is just an indicator for existence of other terms.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found