Learning Distributedand Fair Policiesfor Network Load Balancingas Markov Potential Game
–Neural Information Processing Systems
At t 2 H inahorizonH ofthegireceiwi(t) 2 W, theworkload policy i 2 , where istheload t, a anactionai(t)= {aij(t)}Nj=1, accordingwi(t) are i(t). Q (o, a) r(o, a) Eo0[V (o0)] 2 , whereV (o0)= Ea0[Q (o0,a0) log (a0|o0)] and Q isthetargetQ network; theactorpolicy isupdatedwiththegradient r Eo[Ea [ log (a|o) Q (o, a)]].
Neural Information Processing Systems
Feb-19-2026, 10:52:13 GMT
- Country:
- Europe
- Netherlands > North Holland
- Amsterdam (0.04)
- United Kingdom > England
- Greater London > London (0.04)
- Netherlands > North Holland
- South America > Chile
- Europe
- Technology: