Online Stochastic Shortest Path with Bandit Feedback and Unknown Transition Function
Aviv Rosenberg, Yishay Mansour
–Neural Information Processing Systems
We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes.
Neural Information Processing Systems
Feb-13-2026, 07:22:58 GMT
- Country:
- Industry:
- Education (0.35)
- Technology: