Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning via Incorporating Generalized Human Expertise
Wu, Xuefei, Yin, Xiao, Zhu, Yuanyang, Chen, Chunlin
–arXiv.org Artificial Intelligence
-- Efficient exploration in multi-agent reinforcement learning (MARL) is a challenging problem when receiving only a team reward, especially in environments with sparse rewards. A powerful method to mitigate this issue involves crafting dense individual rewards to guide the agents toward efficient exploration. However, individual rewards generally rely on manually engineered shaping-reward functions that lack high-order intelligence, thus it behaves ineffectively than humans regarding learning and generalization in complex problems. T o tackle these issues, we combine the above two paradigms and propose a novel framework, LIGHT (Learning Individual Intrinsic reward via Incorporating Generalized Human experTise), which can integrate human knowledge into MARL algorithms in an end-to-end manner . LIGHT guides each agent to avoid unnecessary exploration by considering both individual action distribution and human expertise preference distribution. Then, LIGHT designs individual intrinsic rewards for each agent based on actionable representational transformation relevant to Q-learning so that the agents align their action preferences with the human expertise while maximizing the joint action value. Experimental results demonstrate the superiority of our method over representative baselines regarding performance and better knowledge reusability across different sparse-reward tasks on challenging scenarios. Cooperative multi-agent reinforcement learning (MARL) is an important branch in the field of artificial intelligence (AI), playing a crucial role in sequential challenging decision-making problems, such as in autonomous driving [1], sensor networks [2], [3] and robotics control [4]. Centralized training with decentralized execution (CTDE) paradigm has gained substantial attention in cooperative MARL that aims to facilitate agent cooperation by providing global state information during training and executing only based on local observations during execution [5], [6], [7].
arXiv.org Artificial Intelligence
Jul-28-2025
- Country:
- Asia > China > Jiangsu Province > Nanjing (0.05)
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology (0.34)
- Leisure & Entertainment > Games (0.47)
- Transportation > Ground
- Road (0.34)