control problem
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > Middle East > Jordan (0.04)
Tight Rates for Bandit Control Beyond Quadratics
Unlike classical control theory, such as Linear Quadratic Control (LQC), real-world control problems are highly complex. These problems often involve adversarial perturbations, bandit feedback models, and non-quadratic, adversarially chosen cost functions. A fundamental yet unresolved question is whether optimal regret can be achieved for these general control problems. The standard approach to addressing this problem involves a reduction to bandit convex optimization with memory. In the bandit setting, constructing a gradient estimator with low variance is challenging due to the memory structure and non-quadratic loss functions.In this paper, we provide an affirmative answer to this question. Our main contribution is an algorithm that achieves an $\tilde{O}(\sqrt{T})$ optimal regret for bandit non-stochastic control with strongly-convex and smooth cost functions in the presence of adversarial perturbations, improving the previously known $\tilde{O}(T^{2/3})$ regret bound from \citep{cassel2020bandit}. Our algorithm overcomes the memory issue by reducing the problem to Bandit Convex Optimization (BCO) without memory and addresses general strongly-convex costs using recent advancements in BCO from \citep{suggala2024second}. Along the way, we develop an improved algorithm for BCO with memory, which may be of independent interest.
Neural Lyapunov Control
We propose new methods for learning control policies and neural network Lyapunov functions for nonlinear control problems, with provable guarantee of stability. The framework consists of a learner that attempts to find the control and Lyapunov functions, and a falsifier that finds counterexamples to quickly guide the learner towards solutions. The procedure terminates when no counterexample is found by the falsifier, in which case the controlled nonlinear system is provably stable. The approach significantly simplifies the process of Lyapunov control design, provides end-to-end correctness guarantee, and can obtain much larger regions of attraction than existing methods such as LQR and SOS/SDP. We show experiments on how the new methods obtain high-quality solutions for challenging robot control problems such as path tracking for wheeled vehicles and humanoid robot balancing.
Unifying Entropy Regularization in Optimal Control: From and Back to Classical Objectives via Iterated Soft Policies and Path Integral Solutions
Bhole, Ajinkya, Filabadi, Mohammad Mahmoudi, Crevecoeur, Guillaume, Lefebvre, Tom
This paper develops a unified perspective on several stochastic optimal control formulations through the lens of Kullback-Leibler regularization. We propose a central problem that separates the KL penalties on policies and transitions, assigning them independent weights, thereby generalizing the standard trajectory-level KL-regularization commonly used in probabilistic and KL-regularized control. This generalized formulation acts as a generative structure allowing to recover various control problems. These include the classical Stochastic Optimal Control (SOC), Risk-Sensitive Optimal Control (RSOC), and their policy-based KL-regularized counterparts. The latter we refer to as soft-policy SOC and RSOC, facilitating alternative problems with tractable solutions. Beyond serving as regularized variants, we show that these soft-policy formulations majorize the original SOC and RSOC problem. This means that the regularized solution can be iterated to retrieve the original solution. Furthermore, we identify a structurally synchronized case of the risk-seeking soft-policy RSOC formulation, wherein the policy and transition KL-regularization weights coincide. Remarkably, this specific setting gives rise to several powerful properties such as a linear Bellman equation, path integral solution, and, compositionality, thereby extending these computationally favourable properties to a broad class of control problems.
Fault Tolerant Control of Mecanum Wheeled Mobile Robots
Ma, Xuehui, Zhang, Shiliang, Sun, Zhiyong
Mecanum wheeled mobile robots (MWMRs) are highly susceptible to actuator faults that degrade performance and risk mission failure. Current fault tolerant control (FTC) schemes for MWMRs target complete actuator failures like motor stall, ignoring partial faults e.g., in torque degradation. We propose an FTC strategy handling both fault types, where we adopt posterior probability to learn real-time fault parameters. We derive the FTC law by aggregating probability-weighed control laws corresponding to predefined faults. This ensures the robustness and safety of MWMR control despite varying levels of fault occurrence. Simulation results demonstrate the effectiveness of our FTC under diverse scenarios.
- Europe > Norway > Eastern Norway > Oslo (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
Global Convergence of Policy Gradient for Entropy Regularized Linear-Quadratic Control with Multiplicative Noise
Diaz, Gabriel, Li, Lucky, Zhang, Wenhao
Reinforcement Learning (RL) has emerged as a powerful framework for sequential decision-making in dynamic environments, particularly when system parameters are unknown. This paper investigates RL-based control for entropy-regularized linear-quadratic (LQ) control problems with multiplicative noise over an infinite time horizon. First, we adapt the regularized policy gradient (RPG) algorithm to stochastic optimal control settings, proving that despite the non-convexity of the problem, RPG converges globally under conditions of gradient domination and almost-smoothness. Second, based on zero-order optimization approach, we introduce a novel model free RL algorithm: Sample-based regularized policy gradient (SB-RPG). SB-RPG operates without knowledge of system parameters yet still retains strong theoretical guarantees of global convergence. Our model leverages entropy regularization to address the exploration versus exploitation trade-off inherent in RL. Numerical simulations validate the theoretical results and demonstrate the efficiency of SB-RPG in unknown-parameters environments.
- North America > United States > California > Alameda County > Berkeley (0.14)
- Asia > China > Hong Kong (0.04)
- Research Report (1.00)
- Overview (1.00)
Multistage Campaigning in Social Networks
Mehrdad Farajtabar, Xiaojing Ye, Sahar Harati, Le Song, Hongyuan Zha
We consider the problem of how to optimize multi-stage campaigning over social networks. The dynamic programming framework is employed to balance the high present reward and large penalty on low future outcome in the presence of extensive uncertainties. In particular, we establish theoretical foundations of optimal campaigning over social networks where the user activities are modeled as a multivariate Hawkes process, and we derive a time dependent linear relation between the intensity of exogenous events and several commonly used objective functions of campaigning. We further develop a convex dynamic programming framework for determining the optimal intervention policy that prescribes the required level of external drive at each stage for the desired campaigning result. Experiments on both synthetic data and the real-world MemeTracker dataset show that our algorithm can steer the user activities for optimal campaigning much more accurately than baselines.
- North America > United States (0.14)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Government (0.93)
- Information Technology > Services (0.82)
- North America > United States > New Hampshire (0.04)
- North America > United States > Massachusetts (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
- (2 more...)
- North America > United States > New Hampshire (0.04)
- North America > United States > Massachusetts (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
- (2 more...)