Algorithms for Learning Markov Field Policies
–Neural Information Processing Systems
We use a graphical model for representing policies in Markov Decision Processes. This new representation can easily incorporate domain knowledge in the form of a state similarity graph that loosely indicates which states are supposed to have similar optimal actions. A bias is then introduced into the policy search process by sampling policies from a distribution that assigns high probabilities to policies that agree with the provided state similarity graph, i.e. smoother policies.
Neural Information Processing Systems
Mar-14-2024, 13:48:51 GMT
- Technology: