Designing an efficient and equitable humanitarian supply chain dynamically via reinforcement learning

Jin, Weijia

arXiv.org Artificial Intelligence 

Specifically, it is a policy gradient method, often used for deep learning when the policy network is very large. The predecessor to PPO, Trust Region Policy Optimization (TRPO), was published in 2015 by Schulman et al . It addressed the instability issue of another algorithm, the Deep Q - Network (DQN).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found