Optimistic Actor-Critic with Parametric Policies for Linear Markov Decision Processes

Open in new window