Scalable Multi-Agent Reinforcement Learning with General Utilities
Ying, Donghao, Ding, Yuhao, Koppel, Alec, Lavaei, Javad
–arXiv.org Artificial Intelligence
Many decision-making problems take a form beyond the classic cumulative reward, such as apprenticeship learning [1], diverse skill discovery [2], pure exploration [3], and state marginal matching [4], among others. Such problems can be abstracted as reinforcement Learning (RL) with general utilities [5, 6], which focus on finding a policy to maximize a nonlinear function of the induced stateaction occupancy measure. It generalizes the standard RL in which the objective is only an inner product between the state-action occupancy measure induced by the policy and a policy-independent reward for each state-action pair. Beyond the single agent RL, consider the multi-agent problem where different agents need to interact to obtain a favorable outcome by finding a decision policy that maximizes the global accumulation of all agent's general utility.
arXiv.org Artificial Intelligence
Aug-26-2023
- Country:
- North America > United States
- California (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Genre:
- Research Report (0.40)
- Technology: