Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis

Open in new window