A Multi-Agent, Policy-Gradient approach to Network Routing
Tao, Nigel, Baxter, Jonathan, Weaver, Lex
–arXiv.org Artificial Intelligence
Network routing is a distributed decision problem which naturally admits numerical performance measures, such as the average time for a packet to travel from source to destination. OLPOMDP, a policy-gradient reinforcement learning algorithm, was successfully applied to simulated network routing under a number of network models. Multiple distributed agents (routers) learned co-operative behavior without explicit inter-agent communication, and they avoided behavior which was individually desirable, but detrimental to the group's overall performance. Furthermore, shaping the reward signal by explicitly penalizing certain patterns of sub-optimal behavior was found to dramatically improve the convergence rate.
arXiv.org Artificial Intelligence
Dec-4-2025
- Country:
- Europe > Netherlands
- North Holland > Amsterdam (0.04)
- North America > United States
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Massachusetts > Middlesex County
- Oceania > Australia
- Australian Capital Territory > Canberra (0.04)
- Europe > Netherlands
- Genre:
- Research Report (0.40)
- Technology: