lemma 8
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > Austria (0.04)
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > California > Alameda County > Berkeley (0.05)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (4 more...)
Contents of Appendix A Extended Literature Review 14 B Time Uniform Lasso Analysis 15 C Results on Exploration 18 C.1 ALE
Table 2 compares recent work on sparse linear bandits based on a number of important factors. Some of the mentioned bounds depend on problem-dependent parameters (e.g. Carpentier and Munos [ 2012 ] assume that the action set is a Euclidean ball, and that the noise is directly added to the parameter vector, i.e. In this setting, Carpentier and Munos [ 2012 ] present a O ( d p n) regret bound. Li et al. [ 2022 ] require a stronger condition This is generally not true, but may hold with high probability.
- Europe > Sweden > Stockholm > Stockholm (0.05)
- Oceania > Australia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (7 more...)
paper
In this section we provide a detailed proof for the main theorem. First we state some facts about the learning rate and the algorithm. This bound contains three parts. The first is an upper bound for the first step when there is no data. The third part is an "average" of the estimated future regret.