control
Reinforcement Learning for Control with Multiple Frequencies
Many real-world sequential decision problems involve multiple action variables whose control frequencies are different, such that actions take their effects at different periods. While these problems can be formulated with the notion of multiple action persistences in factored-action MDP (FA-MDP), it is non-trivial to solve them efficiently since an action-persistent policy constructed from a stationary policy can be arbitrarily suboptimal, rendering solution methods for the standard FA-MDPs hardly applicable. In this paper, we formalize the problem of multiple control frequencies in RL and provide its efficient solution method. Our proposed method, Action-Persistent Policy Iteration (AP-PI), provides a theoretical guarantee on the convergence to an optimal solution while incurring only a factor of $|A|$ increase in time complexity during policy improvement step, compared to the standard policy iteration for FA-MDPs.
Six-DoF Stewart Platform Motion Simulator Control using Switchable Model Predictive Control
Zhao, Jiangwei, Xu, Zhengjia, Wu, Dongsu, Cao, Yingrui, Xie, Jinpeng
Due to excellent mechanism characteristics of high rigidity, maneuverability and strength-to-weight ratio, 6 Degree-of-Freedom (DoF) Stewart structure is widely adopted to construct flight simulator platforms for replicating motion feelings during training pilots. Unlike conventional serial link manipulator based mechanisms, Upset Prevention and Recovery Training (UPRT) in complex flight status is often accompanied by large speed and violent rate of change in angular velocity of the simulator. However, Classical Washout Filter (CWF) based Motion Cueing Algorithm (MCA) shows limitations in providing rapid response to drive motors to satisfy high accuracy performance requirements. This paper aims at exploiting Model Predictive Control (MPC) based MCA which is proved to be efficient in Hexapod-based motion simulators through controlling over limited linear workspace. With respect to uncertainties and control solution errors from the extraction of Terminal Constraints (COTC), this paper proposes a Switchable Model Predictive Control (S-MPC) based MCA under model adaptive architecture to mitigate the solution uncertainties and inaccuracies. It is verified that high accurate tracking is achievable using the MPC-based MCA with COTC within the simulator operating envelope. The proposed method provides optimal tracking solutions by switching to MPC based MCA without COTC outside the operating envelope. By demonstrating the UPRT with horizontal stall conditions following Average Absolute Scale(AAS) evaluation criteria, the proposed S-MPC based MCA outperforms MPC based MCA and SWF based MCA by 42.34% and 65.30%, respectively.
- Europe (0.28)
- North America (0.28)
Generative Predictive Control: Flow Matching Policies for Dynamic and Difficult-to-Demonstrate Tasks
Kurtz, Vince, Burdick, Joel W.
Generative control policies have recently unlocked major progress in robotics. These methods produce action sequences via diffusion or flow matching, with training data provided by demonstrations. But despite enjoying considerable success on difficult manipulation problems, generative policies come with two key limitations. First, behavior cloning requires expert demonstrations, which can be time-consuming and expensive to obtain. Second, existing methods are limited to relatively slow, quasi-static tasks. In this paper, we leverage a tight connection between sampling-based predictive control and generative modeling to address each of these issues. In particular, we introduce generative predictive control, a supervised learning framework for tasks with fast dynamics that are easy to simulate but difficult to demonstrate. We then show how trained flow-matching policies can be warm-started at run-time, maintaining temporal consistency and enabling fast feedback rates. We believe that generative predictive control offers a complementary approach to existing behavior cloning methods, and hope that it paves the way toward generalist policies that extend beyond quasi-static demonstration-oriented tasks.
Lie-algebra Adaptive Tracking Control for Rigid Body Dynamics
Tang, Jiawei, Li, Shilei, Shi, Ling
Adaptive tracking control for rigid body dynamics is of critical importance in control and robotics, particularly for addressing uncertainties or variations in system model parameters. However, most existing adaptive control methods are designed for systems with states in vector spaces, often neglecting the manifold constraints inherent to robotic systems. In this work, we propose a novel Lie-algebra-based adaptive control method that leverages the intrinsic relationship between the special Euclidean group and its associated Lie algebra. By transforming the state space from the group manifold to a vector space, we derive a linear error dynamics model that decouples model parameters from the system state. This formulation enables the development of an adaptive optimal control method that is both geometrically consistent and computationally efficient. Extensive simulations demonstrate the effectiveness and efficiency of the proposed method. We have made our source code publicly available to the community to support further research and collaboration.
Synthesis of Model Predictive Control and Reinforcement Learning: Survey and Classification
Reiter, Rudolf, Hoffmann, Jasper, Reinhardt, Dirk, Messerer, Florian, Baumgärtner, Katrin, Sawant, Shamburaj, Boedecker, Joschka, Diehl, Moritz, Gros, Sebastien
The fields of MPC and RL consider two successful control techniques for Markov decision processes. Both approaches are derived from similar fundamental principles, and both are widely used in practical applications, including robotics, process control, energy systems, and autonomous driving. Despite their similarities, MPC and RL follow distinct paradigms that emerged from diverse communities and different requirements. Various technical discrepancies, particularly the role of an environment model as part of the algorithm, lead to methodologies with nearly complementary advantages. Due to their orthogonal benefits, research interest in combination methods has recently increased significantly, leading to a large and growing set of complex ideas leveraging MPC and RL. This work illuminates the differences, similarities, and fundamentals that allow for different combination algorithms and categorizes existing work accordingly. Particularly, we focus on the versatile actor-critic RL approach as a basis for our categorization and examine how the online optimization approach of MPC can be used to improve the overall closed-loop performance of a policy.
- Asia (1.00)
- Europe > Germany (0.67)
- North America > United States > Massachusetts > Middlesex County (0.45)
- Europe > United Kingdom > England (0.45)
- Overview (1.00)
- Research Report (0.63)
Event-Based Adaptive Koopman Framework for Optic Flow-Guided Landing on Moving Platforms
Banday, Bazeela, Sah, Chandan Kumar, Keshavan, Jishnu
This paper presents an optic flow-guided approach for achieving soft landings by resource-constrained unmanned aerial vehicles (UAVs) on dynamic platforms. An offline data-driven linear model based on Koopman operator theory is developed to describe the underlying (nonlinear) dynamics of optic flow output obtained from a single monocular camera that maps to vehicle acceleration as the control input. Moreover, a novel adaptation scheme within the Koopman framework is introduced online to handle uncertainties such as unknown platform motion and ground effect, which exert a significant influence during the terminal stage of the descent process. Further, to minimize computational overhead, an event-based adaptation trigger is incorporated into an event-driven Model Predictive Control (MPC) strategy to regulate optic flow and track a desired reference. A detailed convergence analysis ensures global convergence of the tracking error to a uniform ultimate bound while ensuring Zeno-free behavior. Simulation results demonstrate the algorithm's robustness and effectiveness in landing on dynamic platforms under ground effect and sensor noise, which compares favorably to non-adaptive event-triggered and time-triggered adaptive schemes.
- Aerospace & Defense (0.88)
- Energy > Oil & Gas (0.55)
Benchmarking Different QP Formulations and Solvers for Dynamic Quadrupedal Walking
Stark, Franek, Middelberg, Jakob, Mronga, Dennis, Vyas, Shubham, Kirchner, Frank
Quadratic Programs (QPs) are widely used in the control of walking robots, especially in Model Predictive Control (MPC) and Whole-Body Control (WBC). In both cases, the controller design requires the formulation of a QP and the selection of a suitable QP solver, both requiring considerable time and expertise. While computational performance benchmarks exist for QP solvers, studies comparing optimal combinations of computational hardware (HW), QP formulation, and solver performance are lacking. In this work, we compare dense and sparse QP formulations, and multiple solving methods on different HW architectures, focusing on their computational efficiency in dynamic walking of four legged robots using MPC. We introduce the Solve Frequency per Watt (SFPW) as a performance measure to enable a cross hardware comparison of the efficiency of QP solvers. We also benchmark different QP solvers for WBC that we use for trajectory stabilization in quadrupedal walking. As a result, this paper provides recommendations for the selection of QP formulations and solvers for different HW architectures in walking robots and indicates which problems should be devoted the greater technical effort in this domain in future.
Increasing Information for Model Predictive Control with Semi-Markov Decision Processes
Boucher, Rémy Hosseinkhan, Semeraro, Onofrio, Mathelin, Lionel
Recent works in Learning-Based Model Predictive Control of dynamical systems show impressive sample complexity performances using criteria from Information Theory to accelerate the learning procedure. However, the sequential exploration opportunities are limited by the system local state, restraining the amount of information of the observations from the current exploration trajectory. This article resolves this limitation by introducing temporal abstraction through the framework of Semi-Markov Decision Processes. The framework increases the total information of the gathered data for a fixed sampling budget, thus reducing the sample complexity.
- North America > United States > Massachusetts (0.28)
- North America > United States > California (0.28)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Fully Decentralized Policies for Multi-Agent Systems: An Information Theoretic Approach
Roel Dobbe, David Fridovich-Keil, Claire Tomlin
Learning cooperative policies for multi-agent systems is often challenged by partial observability and a lack of coordination. In some settings, the structure of a problem allows a distributed solution with limited communication. Here, we consider a scenario where no communication is available, and instead we learn local policies for all agents that collectively mimic the solution to a centralized multi-agent static optimization problem. Our main contribution is an information theoretic framework based on rate distortion theory which facilitates analysis of how well the resulting fully decentralized policies are able to reconstruct the optimal solution. Moreover, this framework provides a natural extension that addresses which nodes an agent should communicate with to improve the performance of its individual policy.
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- (7 more...)
- Energy > Power Industry (1.00)
- Energy > Renewable (0.93)
Full-Order Sampling-Based MPC for Torque-Level Locomotion Control via Diffusion-Style Annealing
Xue, Haoru, Pan, Chaoyi, Yi, Zeji, Qu, Guannan, Shi, Guanya
Due to high dimensionality and non-convexity, real-time optimal control using full-order dynamics models for legged robots is challenging. Therefore, Nonlinear Model Predictive Control (NMPC) approaches are often limited to reduced-order models. Sampling-based MPC has shown potential in nonconvex even discontinuous problems, but often yields suboptimal solutions with high variance, which limits its applications in high-dimensional locomotion. This work introduces DIAL-MPC (Diffusion-Inspired Annealing for Legged MPC), a sampling-based MPC framework with a novel diffusion-style annealing process. Such an annealing process is supported by the theoretical landscape analysis of Model Predictive Path Integral Control (MPPI) and the connection between MPPI and single-step diffusion. Algorithmically, DIAL-MPC iteratively refines solutions online and achieves both global coverage and local convergence. In quadrupedal torque-level control tasks, DIAL-MPC reduces the tracking error of standard MPPI by $13.4$ times and outperforms reinforcement learning (RL) policies by $50\%$ in challenging climbing tasks without any training. In particular, DIAL-MPC enables precise real-world quadrupedal jumping with payload. To the best of our knowledge, DIAL-MPC is the first training-free method that optimizes over full-order quadruped dynamics in real-time.
- North America > United States > Pennsylvania (0.14)
- Asia > Japan (0.14)
- Asia > China (0.14)