Online Stochastic Shortest Path with Bandit Feedback and Unknown Transition Function

Aviv Rosenberg, Yishay Mansour

Neural Information Processing Systems 

We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found