83adc9225e4deb67d7ce42d58fe5157c-Reviews.html
–Neural Information Processing Systems
This paper introduces a variational framework for planning in infinite-horizon factored MDPs. Leveraging previous work by Liu and Ihler [13], a variational "dual" representation of the maximum expected reward problem (Bellman equation) is considered. Exploiting the factored structure of the transition probabilities and the additive form of the rewards, they introduce an approximation which can be solved by a double-loop Belief-propagation style algorithm. This algorithm is shown to outperform approximate policy iteration and approximate linear programming on several instances of a disease management and a viral marketing MDP. The main contribution of this paper is an extension of the previous approach of Liu and Ihler [13] from influence diagrams to an infinite horizon MDP case.
Neural Information Processing Systems
Mar-13-2024, 18:05:55 GMT
- Technology: