main

Apr-24-2026, 21:32:21 GMT–Neural Information Processing Systems

We then discuss, in 2.2, the challenges one confronts when attempting to address the above two problems directly using derivative-free PG methods by sampling system trajectories. Fortunately, solving zero-sum LQ (stochastic) dynamic games, a benchmark setting in MARL, via derivative-free PG methods by sampling system trajectories provides a workaround to address these problems all in a unified way, due to the well-known equivalence relationships between zero-sum LQ dynamic games and the two aforementioned classes of problems [25], which we will also discuss in A.3.3. A.3.1 Linear Exponential Quadratic Gaussian We first consider a fundamental setting of risk-sensitive optimal control, known as the LEQG problem [22, 27, 28], in the finite-horizon setting. The time-varying (linear) systems dynamics are described by: xt+1 =Atxt +Btut +wt,t 2{0,,N 1}, where xt 2Rm represents the system state; ut 2Rd is the control input; wt 2Rm is an independent (across time) Gaussian random noise drawn from wt N (0,W) for some W> 0; the initial state x0 N (0,X0) is a Gaussian random vector for some X0 >0, independent of the sequence {wt};and At, Bt are time-varying system matrices with appropriate dimensions.

artificial intelligence, machine learning, probability, (17 more...)

Neural Information Processing Systems

Apr-24-2026, 21:32:21 GMT

Conferences PDF

Add feedback

Genre:
- Research Report (0.45)

Technology:
- Information Technology
  - Game Theory (1.00)
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning (0.93)

Duplicate Docs Excel Report

Title
main

Similar Docs Excel Report more

Title	Similarity	Source
None found