Results
–Neural Information Processing Systems
For any > 0, the -covering number of the Euclidean ball Bd(R):= {x 2Rd: kxk2 R} with radius R> 0 in the Euclidean metric is upper bounded by (1+2R/)d. Let F0 F 1 ... FT be a filtration and let X1,X2,...,XT be real random variables such that Xt is Ft-measurable, E[Xt|Ft 1]=0, |Xt| balmost surely, and PT t=1 E[X2t |Ft 1] V for some fixed V> 0and b> 0. Then for any 2(0,1), we have with probability at least 1, For any linear MDP satisfying Definition 3.1, we must have that k (s,a)k2 1/ p d for all s and a, and k,hk2 1/ p d for all and h. By Definition 3.1, we know that Ph( |s,a)= h (s,a),µh()i forms a valid probability distribution, and that k R S |dµh(s)|k2 p d. This yields the first equality. Repeating this calculation h 1more times yields the final equality. Lemma A.8. Fix some h and i (s,a)| 1, and kvk2 p d. Proof. By the linear MDP structure (see Proposition 2.3 of Jin et al. (2020)), for any j, Q j (s,a)= h (s,a),w j i = h (s,a), ji+ Z We first consider the case where u = h for some h which is a valid reward satisfying Definition 3.1. Assume that the reward in our MDP is set such that for h0 6= h, h0 =0 .
Neural Information Processing Systems
Apr-25-2026, 04:01:33 GMT
- Technology: