analysis of Algorithm

Apr-25-2026, 21:10:15 GMT–Neural Information Processing Systems

In this section, we provide a convergence rate analysis for Algorithm 1. Similar to Hazan et al. [36], Algorithm 1 has access to an approximate density oracle and an approximate planner defined below: Visitation density oracle: We assume access to an approximate density estimator that takes in a policy and a density approximation error d 0 as inputs and returns ˆd such that kd ˆd k1 d. Approximate planning oracle: We assume access to an approximate planner that, given any MDP M and error tolerance p 0, returns a policy such that JM() max JM() p. A.1 Proof of Theorem 1 We first give the following proposition that captures certain properties of the proposed objective. The proof is postponed to the end of this section. Taking the above proposition as given for the moment, we prove Theorem 1 following steps similar to those of Hazan et al. [36, Theorem 4.1]. Since k returned by the approximate planning oracle is an p-optimal policy in Mk, we have (1) 1hd k,rki (1) 1hd,rki p for any policy, including?. Therefore, It is straightforward to check that setting 0.1 1, p 0.1, d 0.1 1, 0.1, and the number of iterations K 1 log(10B 1) yields the claim of Theorem 1. Remark 2. Since the temperature parameter k in Proposition 1 goes to zero as k increases, one can show that the expected value of policy returned by Algorithm 1 converges to the maximum performance J(?).

artificial intelligence, gradient, machine learning, (17 more...)

Neural Information Processing Systems

Apr-25-2026, 21:10:15 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.71)

Duplicate Docs Excel Report

Title
5011bf6d8a37692913fce3a15a51f070-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found