On Improving Decentralized Hysteretic Deep Reinforcement Learning

Dec-18-2018–arXiv.org Machine Learning

Recent successes of value-based multi-agent deep reinforcement learning employ optimism by limiting underestimation updates of value function estimator, through carefully controlled learning rate (Omidshafiei et al., 2017) or reduced update probability (Palmer et al., 2018). To achieve full cooperation when learning independently, an agent must estimate the state values contingent on having optimal teammates; therefore, value overestimation is frequency injected to counteract negative effects caused by unobservable teammate sub-optimal policies and explorations. Aiming to solve this issue through automatic scheduling, this paper introduces a decentralized quantile estimator, which we found empirically to be more stable, sample efficient and more likely to converge to the joint optimal policy.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

Dec-18-2018

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents
    - Agent Societies (0.47)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Neural Networks > Deep Learning (0.31)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found