AITopics | disturbance attenuation problem

main

Neural Information Processing SystemsApr-24-2026, 21:32:21 GMT

We then discuss, in 2.2, the challenges one confronts when attempting to address the above two problems directly using derivative-free PG methods by sampling system trajectories. Fortunately, solving zero-sum LQ (stochastic) dynamic games, a benchmark setting in MARL, via derivative-free PG methods by sampling system trajectories provides a workaround to address these problems all in a unified way, due to the well-known equivalence relationships between zero-sum LQ dynamic games and the two aforementioned classes of problems [25], which we will also discuss in A.3.3. A.3.1 Linear Exponential Quadratic Gaussian We first consider a fundamental setting of risk-sensitive optimal control, known as the LEQG problem [22, 27, 28], in the finite-horizon setting. The time-varying (linear) systems dynamics are described by: xt+1 =Atxt +Btut +wt,t 2{0,,N 1}, where xt 2Rm represents the system state; ut 2Rd is the control input; wt 2Rm is an independent (across time) Gaussian random noise drawn from wt N (0,W) for some W> 0; the initial state x0 N (0,X0) is a Gaussian random vector for some X0 >0, independent of the sequence {wt};and At, Bt are time-varying system matrices with appropriate dimensions.

artificial intelligence, machine learning, probability, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.45)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

main_final

Neural Information Processing SystemsApr-24-2026, 21:32:17 GMT

Direct policy search serves as one of the workhorses in modern reinforcement learning (RL), and its applications in continuous control tasks have recently attracted increasing attention. In this work, we investigate the convergence theory of policy gradient (PG) methods for learning the linear risk-sensitive and robust controller. In particular, we develop PG methods that can be implemented in a derivative-free fashion by sampling system trajectories, and establish both global convergence and sample complexity results in the solutions of two fundamental settings in risk-sensitive and robust control: the finite-horizon linear exponential quadratic Gaussian, and the finite-horizon linear-quadratic disturbance attenuation problems. As a by-product, our results also provide the first sample complexity for the global convergence of PG methods on solving zero-sum linear-quadratic dynamic games, a nonconvex-nonconcave minimax optimization problem that serves as a baseline setting in multi-agent reinforcement learning (MARL) with continuous spaces. One feature of our algorithms is that during the learning phase, a certain level of robustness/risk-sensitivity of the controller is preserved, which we termed as the implicit regularization property, and is an essential requirement in safety-critical control systems.

arxiv preprint arxiv, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.34)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

main_final

Neural Information Processing SystemsFeb-7-2026, 15:54:44 GMT

arxiv preprint arxiv, disturbance attenuation problem, matrix, (11 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Illinois (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.46)

Industry: Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Derivative-Free Policy Optimization for Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity

Zhang, Kaiqing, Zhang, Xiangyuan, Hu, Bin, Başar, Tamer

arXiv.org Artificial IntelligenceJan-4-2021

Direct policy search serves as one of the workhorses in modern reinforcement learning (RL), and its applications in continuous control tasks have recently attracted increasing attention. In this work, we investigate the convergence theory of policy gradient (PG) methods for learning the linear risk-sensitive and robust controller. In particular, we develop PG methods that can be implemented in a derivative-free fashion by sampling system trajectories, and establish both global convergence and sample complexity results in the solutions of two fundamental settings in risk-sensitive and robust control: the finite-horizon linear exponential quadratic Gaussian, and the finite-horizon linear-quadratic disturbance attenuation problems. As a by-product, our results also provide the first sample complexity for the global convergence of PG methods on solving zero-sum linear-quadratic dynamic games, a nonconvex-nonconcave minimax optimization problem that serves as a baseline setting in multi-agent reinforcement learning (MARL) with continuous spaces. One feature of our algorithms is that during the learning phase, a certain level of robustness/risk-sensitivity of the controller is preserved, which we termed as the implicit regularization property, and is an essential requirement in safety-critical control systems.

matrix, probability, sequence, (14 more...)

arXiv.org Artificial Intelligence

2101.01041

Country: