Span-Based Optimal Sample Complexity for Weakly Communicating and General Average Reward MDPs

Neural Information Processing Systems 

We also initiate the study of sample complexity in general (multichain) average-reward MDPs.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found