Designing an efficient and equitable humanitarian supply chain dynamically via reinforcement learning

May-26-2025–arXiv.org Artificial Intelligence

Specifically, it is a policy gradient method, often used for deep learning when the policy network is very large. The predecessor to PPO, Trust Region Policy Optimization (TRPO), was published in 2015 by Schulman et al . It addressed the instability issue of another algorithm, the Deep Q - Network (DQN).

evolutionary algorithm, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

May-26-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine
  - Therapeutic Area > Immunology (0.68)
  - Pharmaceuticals & Biotechnology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning
    - Optimization (1.00)
    - Agents (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Evolutionary Systems (1.00)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found