Contents of Appendix A Extended Literature Review 14 B Time Uniform Lasso Analysis 15 C Results on Exploration 18 C.1 ALE 20 C.2 Proof of Results on Exploration 20 D Proof of Regret Bound
–Neural Information Processing Systems
We present the bounds in terms of d and M for coherence with the rest of the text, assuming that M = O(p), which is the case when d p. Table 2 compares recent work on sparse linear bandits based on a number of important factors. The regret bounds in Table 2 are simplified to the terms with largest rate of growth, the reader should check the corresponding papers for rigorous results. Some of the mentioned bounds depend on problem-dependent parameters (e.g. To indicate such parameters we use in Table 2, following the notation of Hao et al. [2020]. Note that varies across the rows of the table, and is just an indicator for existence of other terms.
Neural Information Processing Systems
Feb-10-2025, 06:06:42 GMT
- Genre:
- Overview (0.50)