Online Stochastic Shortest Path with Bandit Feedback and Unknown Transition Function

Aviv Rosenberg, Yishay Mansour

Oct-3-2025, 08:46:10 GMT–Neural Information Processing Systems

We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes.

algorithm, bandit uc-o-rep, transition function, (12 more...)

Neural Information Processing Systems

Oct-3-2025, 08:46:10 GMT

Conferences PDF

Country:
- North America
  - United States > California
    - Los Angeles County > Long Beach (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
  - Jordan (0.04)
  - Israel > Tel Aviv District
    - Tel Aviv (0.04)

Industry:
- Education (0.35)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.94)
  - Machine Learning > Learning Graphical Models
    - Undirected Networks > Markov Models (0.35)

Duplicate Docs Excel Report

Title
Online Stochastic Shortest Path with Bandit Feedback and Unknown Transition Function

Similar Docs Excel Report more

Title	Similarity	Source
None found