Fang, Yongchun
Robust Safety Critical Control Under Multiple State and Input Constraints: Volume Control Barrier Function Method
Dong, Jinyang, Wu, Shizhen, Liu, Rui, Liang, Xiao, Lu, Biao, Fang, Yongchun
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS Robust Safety Critical Control Under Multiple State and Input Constraints: Volume Control Barrier Function Method Jinyang Dong, Shizhen Wu, Rui Liu, Xiao Liang, Senior Member, IEEE, Biao Lu, Member, IEEE, and Y ongchun Fang, Senior Member, IEEE Abstract --In this paper, the safety-critical control problem for uncertain systems under multiple control barrier function (CBF) constraints and input constraints is investigated. A novel framework is proposed to generate a safety filter that minimizes changes to reference inputs when safety risks arise, ensuring a balance between safety and performance. A nonlinear disturbance observer (DOB) based on the robust integral of the sign of the error (RISE) is used to estimate system uncertainties, ensuring that the estimation error converges to zero exponentially. This error bound is integrated into the safety-critical controller to reduce conservativeness while ensuring safety. To further address the challenges arising from multiple CBF and input constraints, a novel Volume CBF (VCBF) is proposed by analyzing the feasible space of the quadratic programming (QP) problem. To ensure that the feasible space does not vanish under disturbances, a DOB-VCBF-based method is introduced, ensuring system safety while maintaining the feasibility of the resulting QP . Subsequently, several groups of simulation and experimental results are provided to validate the effectiveness of the proposed controller. I NTRODUCTION A S automation systems have become integral to our daily lives, the development of safe and high-performance controllers for these systems is of paramount importance. To meet this need, the Control Barrier Function (CBF) is a powerful tool to ensure the safety of control systems [1].
Polytope Volume Monitoring Problem: Formulation and Solution via Parametric Linear Program Based Control Barrier Function
Wu, Shizhen, Dong, Jinyang, Fang, Xu, Sun, Ning, Fang, Yongchun
Motivated by the latest research on feasible space monitoring of multiple control barrier functions (CBFs) as well as polytopic collision avoidance, this paper studies the Polytope Volume Monitoring (PVM) problem, whose goal is to design a control law for inputs of nonlinear systems to prevent the volume of some state-dependent polytope from decreasing to zero. Recent studies have explored the idea of applying Chebyshev ball method in optimization theory to solve the case study of PVM; however, the underlying difficulties caused by nonsmoothness have not been addressed. This paper continues the study on this topic, where our main contribution is to establish the relationship between nonsmooth CBF and parametric optimization theory through directional derivatives for the first time, so as to solve PVM problems more conveniently. In detail, inspired by Chebyshev ball approach, a parametric linear program (PLP) based nonsmooth barrier function candidate is established for PVM, and then, sufficient conditions for it to be a nonsmooth CBF are proposed, based on which a quadratic program (QP) based safety filter with guaranteed feasibility is proposed to address PVM problems. Finally, a numerical simulation example is given to show the efficiency of the proposed safety filter.
Optimization-free Smooth Control Barrier Function for Polygonal Collision Avoidance
Wu, Shizhen, Fang, Yongchun, Sun, Ning, Lu, Biao, Liang, Xiao, Zhao, Yiming
--Polygonal collision avoidance (PCA) is short for the problem of collision avoidance between two polygons (i.e., polytopes in planar) that own their dynamic equations. This problem suffers the inherent difficulty in dealing with non-smooth boundaries and recently optimization-defined metrics, such as signed distance field (SDF) and its variants, have been proposed as control barrier functions (CBFs) to tackle PCA problems. In contrast, we propose an optimization-free smooth CBF method in this paper, which is computationally efficient and proved to be nonconservative. It is achieved by three main steps: a lower bound of SDF is expressed as a nested Boolean logic composition first, then its smooth approximation is established by applying the latest log-sum-exp method, after which a specified CBF-based safety filter is proposed to address this class of problems. T o illustrate its wide applications, the optimization-free smooth CBF method is extended to solve distributed collision avoidance of two underactuated nonholonomic vehicles and drive an underactuated container crane to avoid a moving obstacle respectively, for which numerical simulations are also performed. The control barrier function-based quadratic programming (CBF-QP) control method is popular for safe robotic control [1]-[5]. The CBF-based control can provide a simple and computationally efficient way for safe control synthesis [1], [2], and it has been gradually extended to higher-order systems [3]-[5]. Collision avoidance, i.e., driving the robot away from the obstacle and keeping a distance, is a common goal in reactive control of multi-agent robots such as [6], [7]. And recently, CBFs have been gradually used to achieve more complex collision avoidance [8]-[13]. When the shapes of robots and obstacles are complicated (instead of points or spheres), collision detection is not an obvious problem. Since polytopes can non-conservatively approximate any convex shapes, compared with collision avoidance between ellipsoids or generic convex sets [8]-[10], references [11]-[13] are particularly interested in developing CBFs for collision avoidance between polytopes/polygons, where polygon refers particularly to a polytope in planar. Recently, related works about developing CBF methods for avoiding obstacles with irregular shapes can also be found in [14].
Reflection of Episodes: Learning to Play Game from Expert and Self Experiences
Xu, Xiaojie, Li, Zongyuan, Lu, Chang, Qi, Runnan, Ni, Yanan, Jiang, Lumin, Liu, Xiangbei, Zhang, Xuebo, Fang, Yongchun, Huang, Kuihua, Guo, Xian, Wu, Zhanghua, Li, Zhenya
StarCraft II is a complex and dynamic real-time strategy (RTS) game environment, which is very suitable for artificial intelligence and reinforcement learning research. To address the problem of Large Language Model(LLM) learning in complex environments through self-reflection, we propose a Reflection of Episodes(ROE) framework based on expert experience and self-experience. This framework first obtains key information in the game through a keyframe selection method, then makes decisions based on expert experience and self-experience. After a game is completed, it reflects on the previous experience to obtain new self-experience. Finally, in the experiment, our method beat the robot under the Very Hard difficulty in TextStarCraft II. We analyze the data of the LLM in the process of the game in detail, verified its effectiveness.
Hierarchical Expert Prompt for Large-Language-Model: An Approach Defeat Elite AI in TextStarCraft II for the First Time
Li, Zongyuan, Lu, Chang, Xu, Xiaojie, Qi, Runnan, Ni, Yanan, Jiang, Lumin, Liu, Xiangbei, Zhang, Xuebo, Fang, Yongchun, Huang, Kuihua, Guo, Xian
Since the emergence of the Large Language Model (LLM), LLM has been widely used in fields such as writing, translating, and searching. However, there is still great potential for LLM-based methods in handling complex tasks such as decision-making in the StarCraft II environment. To address problems such as lack of relevant knowledge and poor control over subtasks of varying importance, we propose a Hierarchical Expert Prompt (HEP) for LLM. Our method improves the understanding of game situations through expert-level tactical knowledge, improving the processing quality of tasks of varying importance through a hierarchical framework. Our approach defeated the highest level (Elite) standard built-in agent in TextStarCraft II for the first time and consistently outperformed the baseline method in other difficulties. Our experiments suggest that the proposed method is a practical solution for tackling complex decision-making challenges. The replay video can be viewed on https://www.bilibili.com/video/BV1uz42187EF and https://youtu.be/dO3PshWLV5M, and our codes have been open-sourced on https://github.com/luchang1113/HEP-LLM-play-StarCraftII.
Minimum Snap Trajectory Generation and Control for an Under-actuated Flapping Wing Aerial Vehicle
Qian, Chen, Chen, Rui, Shen, Peiyao, Fang, Yongchun, Yan, Jifu, Li, Tiefeng
Minimum Snap Trajectory Generation and Control for an Under-actuated Flapping Wing Aerial VehicleThis paper presents both the trajectory generation and tracking control strategies for an underactuated flapping wing aerial vehicle (FWAV). First, the FWAV dynamics is analyzed in a practical perspective. Then, based on these analyses, we demonstrate the differential flatness of the FWAV system, and develop a general-purpose trajectory generation strategy. Subsequently, the trajectory tracking controller is developed with the help of robust control and switch control techniques. After that, the overall system asymptotic stability is guaranteed by Lyapunov stability analysis. To make the controller applicable in real flight, we also provide several instructions. Finally, a series of experiment results manifest the successful implementation of the proposed trajectory generation strategy and tracking control strategy. This work firstly achieves the closed-loop integration of trajectory generation and control for real 3-dimensional flight of an underactuated FWAV to a practical level.
G$ \mathbf{^2} $VD Planner: Efficient Motion Planning With Grid-based Generalized Voronoi Diagrams
Wen, Jian, Zhang, Xuebo, Bi, Qingchen, Liu, Hui, Yuan, Jing, Fang, Yongchun
In this paper, an efficient motion planning approach with grid-based generalized Voronoi diagrams (G$ \mathbf{^2} $VD) is newly proposed for mobile robots. Different from existing approaches, the novelty of this work is twofold: 1) a new state lattice-based path searching approach is proposed, in which the search space is reduced to a Voronoi corridor to further improve the search efficiency, along with a Voronoi potential field constructed to make the searched path keep a reasonable distance from obstacles to provide sufficient optimization margin for the subsequent path smoothing; 2) an efficient quadratic programming-based path smoothing approach is presented, wherein the clearance to obstacles is considered in the form of the penalty of the deviation from the safe reference path to improve the path clearance of hard-constrained path smoothing approaches. We validate the efficiency and smoothness of our approach in various challenging simulation scenarios and outdoor environments. It is shown that the computational efficiency is improved by 17.1% in the path searching stage, and path smoothing with the proposed approach is 25.3 times faster than an advanced sparse-banded structure-based path smoothing approach.
Learning Adaptable Risk-Sensitive Policies to Coordinate in Multi-Agent General-Sum Games
Liu, Ziyi, Fang, Yongchun
In general-sum games, the interaction of self-interested learning agents commonly leads to socially worse outcomes, such as defect-defect in the iterated stag hunt (ISH). Previous works address this challenge by sharing rewards or shaping their opponents' learning process, which require too strong assumptions. In this paper, we demonstrate that agents trained to optimize expected returns are more likely to choose a safe action that leads to guaranteed but lower rewards. However, there typically exists a risky action that leads to higher rewards in the long run only if agents cooperate, e.g., cooperate-cooperate in ISH. To overcome this, we propose using action value distribution to characterize the decision's risk and corresponding potential payoffs. Specifically, we present Adaptable Risk-Sensitive Policy (ARSP). ARSP learns the distributions over agent's return and estimates a dynamic risk-seeking bonus to discover risky coordination strategies. Furthermore, to avoid overfitting training opponents, ARSP learns an auxiliary opponent modeling task to infer opponents' types and dynamically alter corresponding strategies during execution. Empirically, agents trained via ARSP can achieve stable coordination during training without accessing opponent's rewards or learning process, and can adapt to non-cooperative opponents during execution. To the best of our knowledge, it is the first method to learn coordination strategies between agents both in iterated prisoner's dilemma (IPD) and iterated stag hunt (ISH) without shaping opponents or rewards, and can adapt to opponents with distinct strategies during execution. Furthermore, we show that ARSP can be scaled to high-dimensional settings.
Towards Practical Autonomous Flight Simulation for Flapping Wing Biomimetic Robots with Experimental Validation
Qian, Chen, Fang, Yongchun, jia, Fan, Yan, Jifu, Liang, Yiming, Li, Tiefeng
Tried-and-true flapping wing robot simulation is essential in developing flapping wing mechanisms and algorithms. This paper presents a novel application-oriented flapping wing platform, highly compatible with various mechanical designs and adaptable to different robotic tasks. First, the blade element theory and the quasi-steady model are put forward to compute the flapping wing aerodynamics based on wing kinematics. Translational lift, translational drag, rotational lift, and added mass force are all considered in the computation. Then we use the proposed simulation platform to investigate the passive wing rotation and the wing-tail interaction phenomena of a particular flapping-wing robot. With the help of the simulation tool and a novel statistic based on dynamic differences from the averaged system, several behaviors display their essence by investigating the flapping wing robot dynamic characteristics. After that, the attitude tracking control problem and the positional trajectory tracking problem are both overcome by robust control techniques. Further comparison simulations reveal that the proposed control algorithms compared with other existing ones show apparent superiority. What is more, with the same control algorithm and parameters tuned in simulation, we conduct real flight experiments on a self-made flapping wing robot, and obtain similar results from the proposed simulation platform. In contrast to existing simulation tools, the proposed one is compatible with most existing flapping wing robots, and can inherently drill into each subtle behavior in corresponding applications by observing aerodynamic forces and torques on each blade element.
Learning Generalizable Risk-Sensitive Policies to Coordinate in Decentralized Multi-Agent General-Sum Games
Liu, Ziyi, Guo, Xian, Fang, Yongchun
While various multi-agent reinforcement learning methods have been proposed in cooperative settings, few works investigate how self-interested learning agents achieve mutual coordination in decentralized general-sum games and generalize pre-trained policies to non-cooperative opponents during execution. In this paper, we present Generalizable Risk-Sensitive Policy (GRSP). GRSP learns the distributions over agent's return and estimate a dynamic risk-seeking bonus to discover risky coordination strategies. Furthermore, to avoid overfitting to training opponents, GRSP learns an auxiliary opponent modeling task to infer opponents' types and dynamically alter corresponding strategies during execution. Empirically, agents trained via GRSP can achieve mutual coordination during training stably and avoid being exploited by non-cooperative opponents during execution. To the best of our knowledge, it is the first method to learn coordination strategies between agents both in iterated prisoner's dilemma (IPD) and iterated stag hunt (ISH) without shaping opponents or rewards, and firstly consider generalization during execution. Furthermore, we show that GRSP can be scaled to high-dimensional settings.