I2Q: AFullyDecentralizedQ-LearningAlgorithm

Neural Information Processing Systems 

The modeling of ideal transition function inI2Q isfully decentralized and independent from the learned policies of other agents, helping I2Q be free from non-stationarity and learn the optimal policy.