Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection

Neural Information Processing Systems 

We study the role of the representation of state-action value functions in regret minimization in finite-horizon Markov Decision Processes (MDPs) with linear structure.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found