Scalable Multi-Agent Reinforcement Learning with General Utilities

Ying, Donghao, Ding, Yuhao, Koppel, Alec, Lavaei, Javad

arXiv.org Artificial Intelligence 

Many decision-making problems take a form beyond the classic cumulative reward, such as apprenticeship learning [1], diverse skill discovery [2], pure exploration [3], and state marginal matching [4], among others. Such problems can be abstracted as reinforcement Learning (RL) with general utilities [5, 6], which focus on finding a policy to maximize a nonlinear function of the induced stateaction occupancy measure. It generalizes the standard RL in which the objective is only an inner product between the state-action occupancy measure induced by the policy and a policy-independent reward for each state-action pair. Beyond the single agent RL, consider the multi-agent problem where different agents need to interact to obtain a favorable outcome by finding a decision policy that maximizes the global accumulation of all agent's general utility.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found