2c53bc01e30711a08f6ac86919193022-Supplemental-Conference.pdf
–Neural Information Processing Systems
Assumption 2. The following conditions are assumed throughout: To do so, let's assume Note that the exploratory state dynamics in (34) is governed by a general Itô process. Theorem 6. Assume that for a policy π and for every x, σ Assumption 3. Assume the following conditions hold: By [56, Section 3.1], the HJ equation in (40) has a unique (subquadratic) viscosity solution Lemma 9. Let π, ˆπ be two feedback policies. Proof of Theorem 2. Note that in (13), the equation to be proven, the right hand side can be written as d Thus, we have shown LHS=RHS in (13). We need a lemma for the perturbation bounds. If we also assume that the upper bounds, i.e. σ By Grönwall's inequality, we have This correction will not affect the two numerical examples as both had set β = 1 (as a hyper-parameter).
Neural Information Processing Systems
Sep-29-2024, 01:10:23 GMT
- Technology: