DistributionalRewardEstimationforEffective Multi-AgentDeepReinforcementLearning

Neural Information Processing Systems 

While in [61], the authors recover the true supervision signals with peer loss, which punishes over-agreement for avoiding overfitting.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found