AITopics | control policy

Collaborating Authors

control policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Safe and Stable Control via Guided Diffusion Models

Neural Information Processing SystemsJun-22-2026, 21:56:10 GMT

Diffusion models have made significant strides in recent years, exhibiting strong generalization capabilities in planning and control tasks. However, most diffusionbased policies remain focused on reward maximization or cost minimization, often overlooking critical aspects of safety and stability. In this work, we propose Safe and Stable Diffusion (S2Diff), a model-based diffusion framework that explores how diffusion models can ensure safety and stability from a Lyapunov perspective. We demonstrate that S2Diff eliminates the reliance on both complex gradientbased solvers (e.g., quadratic programming, non-convex solvers) and controlaffine structures, leading to globally valid control policies driven by the learned certificate functions. Additionally, we uncover intrinsic connections between diffusion sampling and Almost Lyapunov theory, enabling the use of trajectorylevel control policies to learn better certificate functions for safety and stability guarantees. To validate our approach, we conduct experiments on a wide variety of dynamical control systems, where S2Diff consistently outperforms both certificatebased controllers and model-based diffusion baselines in terms of safety, stability, and overall control performance.

artificial intelligence, machine learning, trajectory, (18 more...)

Neural Information Processing Systems

Country: Europe (0.67)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Globally Optimal Policy Gradient Algorithms for Reinforcement Learning with PID Control Policies

Neural Information Processing SystemsJun-14-2026, 06:21:02 GMT

RL enables learning control policies through direct interaction with a system, without explicit model knowledge that is typically assumed in classical control. The PID policy architecture offers built-in structural advantages, such as superior tracking performance, elimination of steady-state errors, and robustness to model error that have made it a widely adopted paradigm in practice. Despite these advantages, the PID parameterization has received limited attention in the RL literature, and PID control designs continue to rely on heuristic tuning rules without theoretical guarantees. We address this gap by rigorously integrating PID control with RL, offering theoretical guarantees while maintaining the practical advantages that have made PID control ubiquitous in practice. Specifically, we first formulate PID control design as an optimization problem with a control policy that is parameterized by proportional, integral, and derivative components. We derive exact expressions for policy gradients in these parameters, and leverage them to develop both model-based and model-free policy gradient algorithms for PID policies. We then establish gradient dominance properties of the PID policy optimization problem, and provide theoretical guarantees on convergence and global optimality in this setting.

artificial intelligence, name change, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.82)

Add feedback

Safe and Stable Control via Lyapunov-Guided Diffusion Models

Neural Information Processing SystemsJun-14-2026, 05:18:04 GMT

Diffusion models have made significant strides in recent years, exhibiting strong generalization capabilities in planning and control tasks. However, most diffusion-based policies remain focused on reward maximization or cost minimization, often overlooking critical aspects of safety and stability. In this work, we propose Safe and Stable Diffusion ($S^2$Diff), a model-based framework that explores how diffusion models can ensure safety and stability from a Lyapunov perspective. We demonstrate that $S^2$Diff eliminates the reliance on both complex gradient-based solvers (e.g., quadratic programming, non-convex solvers) and control-affine structures, leading to globally valid control policies driven by the learned certificate functions. Additionally, we uncover intrinsic connections between diffusion sampling and almost Lyapunov theory, enabling the use of trajectory-level control policies to learn better certificate functions for safety and stability guarantees. To validate our approach, we conduct experiments on a wide variety of dynamical control systems, where $S^2$Diff consistently outperforms both certificate-based controllers and model-based diffusion baselines in terms of safety, stability, and overall control performance.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Convergent Functions, Divergent Forms

Neural Information Processing SystemsJun-11-2026, 21:56:06 GMT

We introduce LOKI, a compute-efficient framework for co-designing morphologies and control policies that generalize across unseen tasks. Inspired by biological adaptation--where animals quickly adjust to morphological changes--our method overcomes the inefficiencies of traditional evolutionary and quality-diversity algorithms. We propose learning convergent functions: shared control policies trained across clusters of morphologically similar designs in a learned latent space, drastically reducing the training cost per design. Simultaneously, we promote divergent forms by replacing mutation with dynamic local search, enabling broader exploration and preventing premature convergence. The policy reuse allows us to explore $\sim780\times$ more designs using 78\% fewer simulation steps and 40\% less compute per design. Local competition paired with a broader search results in a diverse set of high-performing final morphologies. Using the UNIMAL design space and a flat-terrain locomotion task, LOKI discovers a rich variety of designs--ranging from quadrupeds to crabs, bipedals, and spinners--far more diverse than those produced by prior work. These morphologies also transfer better to unseen downstream tasks in agility, stability, and manipulation domains (e.g.

artificial intelligence, name change, proceedings, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.39)

Add feedback

Co-Learning Port-Hamiltonian Systems and Optimal Energy-Shaping Control

Kamboj, Ankur, Dey, Biswadip, Srivastava, Vaibhav

arXiv.org Machine LearningMay-7-2026

We develop a physics-informed learning framework for energy-shaping control of port-Hamiltonian (pH) systems from trajectory data. The proposed approach co-learns a pH system model and an optimal energy-balancing passivity-based controller (EB-PBC) through alternating optimization with policy-aware data collection. At each iteration, the system model is refined using trajectory data collected under the current control policy, and the controller is re-optimized on the updated model. Both components are parameterized by neural networks that embed the pH dynamics and EB-PBC structure, ensuring interpretability in terms of energy interactions. The learned controller renders the closed-loop system inherently passive and provably stable, and exploits passive plant dynamics without canceling the natural potential. A dissipation regularization enforces strict energy decay during training, thereby enhancing robustness to sim-to-real gaps. The proposed framework is validated on state-regulation and swing-up tasks for planar and torsional pendulum systems.

artificial intelligence, controller, machine learning, (16 more...)

arXiv.org Machine Learning

2604.26172

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.64)

Industry: Energy (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Neural Lyapunov Control for Discrete-Time Systems

Neural Information Processing SystemsApr-24-2026, 12:48:42 GMT

While ensuring stability for linear systems is well understood, it remains a major challenge for nonlinear systems. A general approach in such cases is to compute a combination of a Lyapunov function and an associated control policy. However, finding Lyapunov functions for general nonlinear systems is a challenging task. To address this challenge, several methods have been proposed that represent Lyapunov functions using neural networks. However, such approaches either focus on continuous-time systems, or highly restricted classes of nonlinear dynamics.

artificial intelligence, lyapunov function, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Efficient Multi-task Reinforcement Learning with Cross-Task Policy Guidance

Neural Information Processing SystemsMar-22-2026, 15:31:32 GMT

Multi-task reinforcement learning endeavors to efficiently leverage shared information across various tasks, facilitating the simultaneous learning of multiple tasks. Existing approaches primarily focus on parameter sharing with carefully designed network structures or tailored optimization procedures. However, they overlook a direct and complementary way to exploit cross-task similarities: the control policies of tasks already proficient in some skills can provide explicit guidance for unmastered tasks to accelerate skills acquisition. To this end, we present a novel framework called Cross-Task Policy Guidance (CTPG), which trains a guide policy for each task to select the behavior policy interacting with the environment from all tasks' control policies, generating better training trajectories. In addition, we propose two gating mechanisms to improve the learning efficiency of CTPG: one gate filters out control policies that are not beneficial for guidance, while the other gate blocks tasks that do not necessitate guidance. CTPG is a general framework adaptable to existing parameter sharing approaches. Empirical evaluations demonstrate that incorporating CTPG with these approaches significantly enhances performance in manipulation and locomotion benchmarks.

artificial intelligence, machine learning, reinforcement learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.63)

Add feedback

Efficient Morphology-Control Co-Design via Stackelberg Proximal Policy Optimization

Dai, Yanning, Wang, Yuhui, Ashley, Dylan R., Schmidhuber, Jürgen

arXiv.org Machine LearningMar-17-2026

Morphology-control co-design concerns the coupled optimization of an agent's body structure and control policy. This problem exhibits a bi-level structure, where the control dynamically adapts to the morphology to maximize performance. Existing methods typically neglect the control's adaptation dynamics by adopting a single-level formulation that treats the control policy as fixed when optimizing morphology. This can lead to inefficient optimization, as morphology updates may be misaligned with control adaptation. In this paper, we revisit the co-design problem from a game-theoretic perspective, modeling the intrinsic coupling between morphology and control as a novel variant of a Stackelberg game. We propose Stackelberg Proximal Policy Optimization (Stackelberg PPO), which explicitly incorporates the control's adaptation dynamics into morphology optimization. By modeling this intrinsic coupling, our method aligns morphology updates with control adaptation, thereby stabilizing training and improving learning efficiency. Experiments across diverse co-design tasks demonstrate that Stackelberg PPO outperforms standard PPO in both stability and final performance, opening the way for dramatically more efficient robotics designs.

artificial intelligence, machine learning, morphology, (18 more...)

arXiv.org Machine Learning

2603.15388

Country:

Europe > Switzerland (0.04)
Europe > Denmark (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Game Theory (0.88)
Information Technology > Artificial Intelligence > Robots (0.67)

Add feedback

Exact Verification of ReLU Neural Control Barrier Functions

Neural Information Processing SystemsFeb-19-2026, 00:13:11 GMT

In CBF-based control, the desired safety properties of the system are mapped to nonnegativity of a CBF, and the control input is chosen to ensure that the CBF remains nonnegative for all time.

artificial intelligence, machine learning, ncbf, (18 more...)

Neural Information Processing Systems

Country: