Learning Distributedand Fair Policiesfor Network Load Balancingas Markov Potential Game

Neural Information Processing Systems 

At t 2 H inahorizonH ofthegireceiwi(t) 2 W, theworkload policy i 2 , where istheload t, a anactionai(t)= {aij(t)}Nj=1, accordingwi(t) are i(t). Q (o, a) r(o, a) Eo0[V (o0)] 2 , whereV (o0)= Ea0[Q (o0,a0) log (a0|o0)] and Q isthetargetQ network; theactorpolicy isupdatedwiththegradient r Eo[Ea [ log (a|o) Q (o, a)]].

Similar Docs  Excel Report  more

TitleSimilaritySource
None found