A Bandit Regret Bound Analysis A.1 Algorithm Procedure At each rounds [t ], after performing a list of actions { A

Neural Information Processing Systems 

In this section we will give a theoretical guarantee for the performance of our algorithm. Lemma 0. Fix any sequence of confidence set After that, we prove that Lemma 2. The first term of (18) comes from (10), and the second term is from Cauchy inequality. The main structure of this proof is similar to proposition 3, section C in Eluder dimension's Apart from the notations section 3, we add more symbols for the regret analysis. According to assumption 2.2 we know that By lemma 6 in [? ] we have sup Next, we are going to bound the two terms in (58). Summarizing all the inequalities and we know the whole lemma holds.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found