Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design

Open in new window