A Additional details for experiment presented in Section 3 Motivation We trained each agent i with online Q-learning [ 33 ] on the Q

Nov-20-2025, 09:32:57 GMT–Neural Information Processing Systems

The Boltzmann temperature is fixed to 1 and we set the learning rate to 0.05 and the discount factor to 0.99. To maximize their return, agents must therefore spread out and cover all landmarks. We use a discount factor γ of 0 .95 Since policies' hidden layers are of size 128 the corresponding value for During training, a policy is evaluated on a set of 10 different episodes every 100 learning steps. TeamReg is outperformed by all other algorithms when considering average return across agents.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Nov-20-2025, 09:32:57 GMT

Conferences PDF

Add feedback

Genre:
- Instructional Material > Online (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (0.86)
  - Representation & Reasoning > Agents (0.87)