ADAPTER-RL: Adaptation of Any Agent using Reinforcement Learning

Jin, Yizhao, Slabaugh, Greg, Lucas, Simon

Nov-19-2023–arXiv.org Artificial Intelligence

Lastly, in scenarios where multiple agents are present, the behavioral mixture of agents approach, for example Vinyals et al. (2019) samples the final agent from the Nash distribution of the set of agents, can be utilized. Given that different agents, or experts, may recommend varying actions for an identical state, this results in an intrinsic stochastic policy, taking advantage of the diversity in agent decisions. If the state space is continuous, a common approach is to transform the actions into a normal or beta distribution. We apply one-hot encoding with temperature-scaled softmax. A discrete action space can be represented as a one-hot encoded vector, For instance, if action 2 out of 5 is chosen, its one-hot representation is [0, 1, 0, 0, 0], the scale the one-hot vector to [0, 1/τ, 0, 0, 0]. The higher the temperature coefficient τ, the more spread out the distribution becomes, while a lower temperature coefficient nudges the distribution closer to a deterministic action.

adapter, agent, reinforcement learning, (10 more...)

arXiv.org Artificial Intelligence

Nov-19-2023

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
- Europe
  - United Kingdom > England
    - Greater London > London (0.04)
  - Romania > Sud - Muntenia Development Region
    - Giurgiu County > Giurgiu (0.04)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Leisure & Entertainment > Games (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents
    - Agent Societies (0.54)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Neural Networks > Deep Learning (0.46)