ADAPTER-RL: Adaptation of Any Agent using Reinforcement Learning
Jin, Yizhao, Slabaugh, Greg, Lucas, Simon
–arXiv.org Artificial Intelligence
Lastly, in scenarios where multiple agents are present, the behavioral mixture of agents approach, for example Vinyals et al. (2019) samples the final agent from the Nash distribution of the set of agents, can be utilized. Given that different agents, or experts, may recommend varying actions for an identical state, this results in an intrinsic stochastic policy, taking advantage of the diversity in agent decisions. If the state space is continuous, a common approach is to transform the actions into a normal or beta distribution. We apply one-hot encoding with temperature-scaled softmax. A discrete action space can be represented as a one-hot encoded vector, For instance, if action 2 out of 5 is chosen, its one-hot representation is [0, 1, 0, 0, 0], the scale the one-hot vector to [0, 1/τ, 0, 0, 0]. The higher the temperature coefficient τ, the more spread out the distribution becomes, while a lower temperature coefficient nudges the distribution closer to a deterministic action.
arXiv.org Artificial Intelligence
Nov-19-2023
- Country:
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Leisure & Entertainment > Games (1.00)
- Technology: