Entropy-Regularized Partially Observed Markov Decision Processes

Dec-22-2021–arXiv.org Artificial Intelligence

We investigate partially observed Markov decision processes (POMDPs) with cost functions regularized by entropy terms describing state, observation, and control uncertainty. Standard POMDP techniques are shown to offer bounded-error solutions to these entropy-regularized POMDPs, with exact solutions when the regularization involves the joint entropy of the state, observation, and control trajectories. Our joint-entropy result is particularly surprising since it constitutes a novel, tractable formulation of active state estimation. Partially observed Markov decision processes (POMDPs) and Markov decision processes (MDPs) with information-theoretic costs have attracted widespread attention across the technical disciplines of systems and control [2]-[5], computer science [6]-[8], signal processing [9]-[12], and robotics [13]-[15]. Interest in such POMDPs has been driven, in large part, by active state estimation problems in which informationtheoretic costs describing the uncertainty about latent states are minimized in order to aid or enhance the performance of state estimation algorithms [5], [6], [9], [10].

entropy, input-output entropy, pomdp, (13 more...)

arXiv.org Artificial Intelligence

Dec-22-2021

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia (0.14)
- North America > United States
  - New York (0.04)
  - Massachusetts > Middlesex County
    - Belmont (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Switzerland > Zürich
    - Zürich (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)