Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting

Neural Information Processing Systems 

This type of MDPs is commonly referred to as linear MDPs.