Using Free Energies to Represent Q-values in a Multiagent Reinforcement Learning Task

Sallans, Brian, Hinton, Geoffrey E.

Neural Information Processing Systems 

The problem of reinforcement learning in large factored Markov decision processes is explored. The Q-value of a state-action pair is approximated by the free energy of a product of experts network. Network parameters are learned online using a modified SARSA algorithm which minimizes the inconsistency of the Q-values of consecutive state-action pairs. Actions arechosen based on the current value estimates by fixing the current state and sampling actions from the network using Gibbs sampling. The algorithm is tested on a cooperative multi-agent task. The product of experts model is found to perform comparably to table-based Q-Iearning for small instances of the task, and continues to perform well when the problem becomes too large for a table-based representation.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found