A Mathematical Derivation
–Neural Information Processing Systems
The intrinsic reward function of skill discovery is shown as follows. We compare our approach with multi-agent value decomposition methods (QMIX and QPLEX), role-based methods (ROMA and RODE), diversity-based method (CDS) and skill-based method (HSD). We develop our method based on the Python MARL framework (PyMARL) on the github. QMIX, which could be found in the source codes. Parameter V alue Algorithm hyper-parameters Discount factor 0.99 Batch size 32 Buffer size 5000 Optimizer RMSprop Learning rate 0.0005 Interval of target network update 200 Agent network hyper-parameters Temporal module in Agent network GRU Dimensions of hidden states of temporal module 64 Mixing network hyper-parameters Dimensions of mixing network embedding 32 Number of hyper network layers 2 Dimensions of hyper network embedding 64 HSL hyper-parameters Dimensions of skill representation encoder embedding 20 Reward decoder scaling factor 10 Cosine distance scaling factor 0.1, 1 Skill representation learning mechanism training steps 50000 Dimensions of skill selector encoding network embedding 32 Decision interval of the skill selector 5 For configuration of the rest hyper-parameters in our framework, we list them in Table 2.
Neural Information Processing Systems
Nov-17-2025, 13:28:40 GMT
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Representation & Reasoning > Agents (0.93)
- Robots (0.94)
- Information Technology > Artificial Intelligence