main

Neural Information Processing Systems 

We then discuss, in 2.2, the challenges one confronts when attempting to address the above two problems directly using derivative-free PG methods by sampling system trajectories. Fortunately, solving zero-sum LQ (stochastic) dynamic games, a benchmark setting in MARL, via derivative-free PG methods by sampling system trajectories provides a workaround to address these problems all in a unified way, due to the well-known equivalence relationships between zero-sum LQ dynamic games and the two aforementioned classes of problems [25], which we will also discuss in A.3.3. A.3.1 Linear Exponential Quadratic Gaussian We first consider a fundamental setting of risk-sensitive optimal control, known as the LEQG problem [22, 27, 28], in the finite-horizon setting. The time-varying (linear) systems dynamics are described by: xt+1 =Atxt +Btut +wt,t 2{0,,N 1}, where xt 2Rm represents the system state; ut 2Rd is the control input; wt 2Rm is an independent (across time) Gaussian random noise drawn from wt N (0,W) for some W> 0; the initial state x0 N (0,X0) is a Gaussian random vector for some X0 >0, independent of the sequence {wt};and At, Bt are time-varying system matrices with appropriate dimensions.

Duplicate Docs Excel Report

Title
main

Similar Docs  Excel Report  more

TitleSimilaritySource
None found