A Background
–Neural Information Processing Systems
A.1 Partially Observable Mackov Decision Process We follow previous works [25] to consider MARL as a partially observable Markov games [22]. We define a set of states S describing the possible configurations of all n agents. Then, each agent i gets rewards as a function of the state and agent's action r In the following paragraph, we use superscript to indicate agent's index and subscript to indicate time step for states, observations, rewards and actions. A.2 Decision Transformer Decision Transformer [3] using Transformer [44] which is an architecture to efficiently model sequential data shows its ability to cast the problem of RL as conditional sequence modeling. The core component of transformer is attention mechanism [44].
Neural Information Processing Systems
Feb-18-2024, 04:16:58 GMT
- Technology: