A Additional details for experiment presented in Section 3 Motivation We trained each agent i with online Q-learning [33] on the Q