SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning

Oct-10-2024, 10:05:40 GMT–Neural Information Processing Systems

Value factorisation is a useful technique for multi-agent reinforcement learning (MARL) in global reward game, however, its underlying mechanism is not yet fully understood. This paper studies a theoretical framework for value factorisation with interpretability via Shapley value theory. We generalise Shapley value to Markov convex game called Markov Shapley value (MSV) and apply it as a value factorisation method in global reward game, which is obtained by the equivalence between the two games. Based on the properties of MSV, we derive Shapley-Bellman optimality equation (SBOE) to evaluate the optimal MSV, which corresponds to an optimal joint deterministic policy. Furthermore, we propose Shapley-Bellman operator (SBO) that is proved to solve SBOE.

incorporating shapley value theory, multi-agent q-learning, value factorisation method, (6 more...)

Neural Information Processing Systems

Oct-10-2024, 10:05:40 GMT

Conferences Web Page

Add feedback

Genre:
- Play > Prospect (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)