Reviews: Large Scale Markov Decision Processes with Changing Rewards

Jan-24-2025, 00:53:47 GMT–Neural Information Processing Systems

I still feel that the work would be greatly improved by adding numerical experiments. In particular, the authors refer to a specific setting called'online MDP', where the dynamics, that is, the transition probabilities, are known while the reward is not. Regret minimization then refers to the idea to minimize the regret'' given that rewards could be chosen/observed in an adversarial manner. The authors start with a (rather technical) introduction, pose related work, and explain the main ideas based on concise preliminaries. Afterwards, an extension to large state spaces by using approximate occupancy measures and thereby avoiding concrete state-mappings is provided.

international conference, scale markov decision process, transition probability, (5 more...)

Neural Information Processing Systems

Jan-24-2025, 00:53:47 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.79)
  - Machine Learning > Learning Graphical Models
    - Undirected Networks > Markov Models (0.47)