A Bandit Regret Bound Analysis
–Neural Information Processing Systems
Before diving into details, we first explain the overall idea and structure of our proof. After that, we prove that Lemma 2. The first term of (18) comes from (10), and the second term is from Cauchy inequality. The main structure of this proof is similar to proposition 3, section C in Eluder dimension's paper, and we will only point out the subtle details that makes the difference. Apart from the notations section 3, we add more symbols for the regret analysis. B.1 Main Proof sketch The overall structure is similar to bandits, the main difference here is that we need to take care of the transition dynamics.
Neural Information Processing Systems
Mar-21-2025, 12:10:11 GMT
- Technology: