A Multi-Agent, Policy-Gradient approach to Network Routing

Tao, Nigel, Baxter, Jonathan, Weaver, Lex

Dec-4-2025–arXiv.org Artificial Intelligence

Network routing is a distributed decision problem which naturally admits numerical performance measures, such as the average time for a packet to travel from source to destination. OLPOMDP, a policy-gradient reinforcement learning algorithm, was successfully applied to simulated network routing under a number of network models. Multiple distributed agents (routers) learned co-operative behavior without explicit inter-agent communication, and they avoided behavior which was individually desirable, but detrimental to the group's overall performance. Furthermore, shaping the reward signal by explicitly penalizing certain patterns of sub-optimal behavior was found to dramatically improve the convergence rate.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

Dec-4-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.68)

Genre:
- Research Report (0.40)

Technology:
- Information Technology
  - Communications > Networks (1.00)
  - Artificial Intelligence
    - Representation & Reasoning
      - Agents (1.00)
      - Optimization (0.93)
    - Machine Learning
      - Reinforcement Learning (1.00)
      - Learning Graphical Models > Undirected Networks
        Markov Models (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found