original system
Safety Embedded Adaptive Control Using Barrier States
AL-Sunni, Maitham F., Almubarak, Hassan, Dolan, John M.
-- In this work, we explore the application of barrier states (BaS) in the realm of safe nonlinear adaptive control. Our proposed framework derives barrier states for systems with parametric uncertainty, which are augmented into the uncertain dynamical model. We employ an adaptive nonlinear control strategy based on a control Lyapunov functions approach to design a stabilizing controller for the augmented system. The developed theory shows that the controller ensures safe control actions for the original system while meeting specified performance objectives. We validate the effectiveness of our approach through simulations on diverse systems, including a planar quadrotor subject to unknown drag forces and an adaptive cruise control system, for which we provide comparisons with existing methodologies. Safe control methods have increasingly gained attention in recent research due to their importance in ensuring system reliability. Many of these methods rely on the notion of set invariance and detailed system models to maintain safety.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > Colorado > Denver County > Denver (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Deep Q-Learning with Gradient Target Tracking
Lee, Donghwan, Park, Bum Geun, Lee, Taeho
This paper introduces Q-learning with gradient target tracking, a novel reinforcement learning framework that provides a learned continuous target update mechanism as an alternative to the conventional hard update paradigm. In the standard deep Q-network (DQN), the target network is a copy of the online network's weights, held fixed for a number of iterations before being periodically replaced via a hard update. While this stabilizes training by providing consistent targets, it introduces a new challenge: the hard update period must be carefully tuned to achieve optimal performance. To address this issue, we propose two gradient-based target update methods: DQN with asymmetric gradient target tracking (AGT2-DQN) and DQN with symmetric gradient target tracking (SGT2-DQN). These methods replace the conventional hard target updates with continuous and structured updates using gradient descent, which effectively eliminates the need for manual tuning. We provide a theoretical analysis proving the convergence of these methods in tabular settings. Additionally, empirical evaluations demonstrate their advantages over standard DQN baselines, which suggest that gradient-based target updates can serve as an effective alternative to conventional target update mechanisms in Q-learning.
- North America > United States (0.05)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Certified Robust Invariant Polytope Training in Neural Controlled ODEs
Harapanahalli, Akash, Coogan, Samuel
We consider a nonlinear control system modeled as an ordinary differential equation subject to disturbance, with a state feedback controller parameterized as a feedforward neural network. We propose a framework for training controllers with certified robust forward invariant polytopes, where any trajectory initialized inside the polytope remains within the polytope, regardless of the disturbance. First, we parameterize a family of lifted control systems in a higher dimensional space, where the original neural controlled system evolves on an invariant subspace of each lifted system. We use interval analysis and neural network verifiers to further construct a family of lifted embedding systems, carefully capturing the knowledge of this invariant subspace. If the vector field of any lifted embedding system satisfies a sign constraint at a single point, then a certain convex polytope of the original system is robustly forward invariant. Treating the neural network controller and the lifted system parameters as variables, we propose an algorithm to train controllers with certified forward invariant polytopes in the closed-loop control system. Through two examples, we demonstrate how the simplicity of the sign constraint allows our approach to scale with system dimension to over $50$ states, and outperform state-of-the-art Lyapunov-based sampling approaches in runtime.
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Asia > Middle East > Jordan (0.04)
Finite-Time Analysis of Simultaneous Double Q-learning
Q-learning is one of the most fundamental reinforcement learning (RL) algorithms. Despite its widespread success in various applications, it is prone to overestimation bias in the Q-learning update. To address this issue, double Q-learning employs two independent Q-estimators which are randomly selected and updated during the learning process. This paper proposes a modified double Q-learning, called simultaneous double Q-learning (SDQ), with its finite-time analysis. SDQ eliminates the need for random selection between the two Q-estimators, and this modification allows us to analyze double Q-learning through the lens of a novel switching system framework facilitating efficient finite-time analysis. Empirical studies demonstrate that SDQ converges faster than double Q-learning while retaining the ability to mitigate the maximization bias. Finally, we derive a finite-time expected error bound for SDQ.
Rocket Landing Control with Grid Fins and Path-following using MPC
In this project, we attempt to optimize a landing trajectory of a rocket. The goal is to minimize the total fuel consumption during the landing process using different techniques. Once the optimal and feasible trajectory is generated using batch approach, we attempt to follow the path using a Model Predictive Control (MPC) based algorithm, called Trajectory Optimizing Path following Estimation from Demonstration (TOPED), in order to generalize to similar initial states and models, where we introduce a novel cost function for the MPC to solve. We further show that TOPED can follow a demonstration trajectory well in practice under model mismatch and different initial states.
Learning effective dynamics from data-driven stochastic systems
Feng, Lingyu, Gao, Ting, Dai, Min, Duan, Jinqiao
Numerous complex systems in the areas of science, engineering, chemistry or material science have the philosophy of multiscale properties in their dynamic evolution [1-4]. By considering models at different scales simultaneously, we would like to obtain both the efficiency of the macroscopic models as well as the accuracy of the microscopic models. For example, approaches in chemistry usually involve the quantum mechanics models in the reaction region and the classical molecular models elsewhere [5]. Besides, as noisy observations always exist in all kinds of systems under internal or external factors, stochastic dynamical systems come to play an important role in modeling such phenomena. Thus, it is of great importance to study multiscale stochastic dynamical systems [5, 6]. To better understand the intrinsic nature of such complex systems, researchers usually try to investigate the effective dynamics of these systems, such as invariant manifolds, global attractors, tipping points, noise induced bifurcations, transition pathways, and so on [7-11]. These dynamical behaviors could capture the fundamental structures when the system evolves over time or parameter space.
- Asia > China > Hubei Province > Wuhan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Forward Invariance in Neural Network Controlled Systems
Harapanahalli, Akash, Jafarpour, Saber, Coogan, Samuel
We present a framework based on interval analysis and monotone systems theory to certify and search for forward invariant sets in nonlinear systems with neural network controllers. The framework (i) constructs localized first-order inclusion functions for the closed-loop system using Jacobian bounds and existing neural network verification tools; (ii) builds a dynamical embedding system where its evaluation along a single trajectory directly corresponds with a nested family of hyper-rectangles provably converging to an attractive set of the original system; (iii) utilizes linear transformations to build families of nested paralleletopes with the same properties. The framework is automated in Python using our interval analysis toolbox $\texttt{npinterval}$, in conjunction with the symbolic arithmetic toolbox $\texttt{sympy}$, demonstrated on an $8$-dimensional leader-follower system.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
A large-scale particle system with independent jumps and distributed synchronization
Baryshnikov, Yuliy, Stolyar, Alexander
We study a system consisting of $n$ particles, moving forward in jumps on the real line. Each particle can make both independent jumps, whose sizes have some distribution, or ``synchronization'' jumps, which allow it to join a randomly chosen other particle if the latter happens to be ahead of it. The mean-field asymptotic regime, where $n\to\infty$, is considered. As $n\to\infty$, we prove the convergence of the system dynamics to that of a deterministic mean-field limit (MFL). We obtain results on the average speed of advance of a ``benchmark'' MFL (BMFL) and the liminf of the steady-state speed of advance, in terms of MFLs that are traveling waves. For the special case of exponentially distributed independent jump sizes, we prove that a traveling wave MFL with speed $v$ exists if and only if $v\ge v_*$, with $v_*$ having simple explicit form; this allows us to show that the average speed of the BMFL is equal to $v_*$ and the liminf of the steady-state speeds is lower bounded by $v_*$. Finally, we put forward a conjecture that both the average speed of the BMFL and the exact limit of the steady-state speeds, under general distribution of an independent jump size, are equal to number $v_{**}$, which is easily found from a ``minimum speed principle.'' This general conjecture is consistent with our results for the exponentially distributed jumps and is confirmed by simulations.
- North America > United States > New Jersey > Hudson County > Hoboken (0.14)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Relative ultra-wideband based localization of multi-robot systems with kinematic extended Kalman filter
Ichekhlef, Salma, Villemure, Étienne, Naderi, Shokoufeh, Ferland, François, Blondin, Maude
Localization plays a critical role in the field of distributed swarm robotics. Previous work has highlighted the potential of relative localization for position tracking in multi-robot systems. Ultra-wideband (UWB) technology provides a good estimation of the relative position between robots but suffers from some limitations. This paper proposes improving the relative localization functionality developed in our previous work, which is based on UWB technology. Our new approach merges UWB telemetry and kinematic model into an extended Kalman filter to properly track the relative position of robots. We performed a simulation and validated the improvements in relative distance and angle accuracy for the proposed approach. An additional analysis was conducted to observe the increase in performance when the robots share their control inputs.
Discovering dynamical features of Hodgkin-Huxley-type model of physiological neuron using artificial neural network
Kuptsov, Pavel V., Stankevich, Nataliya V., Bagautdinova, Elmira R.
We consider Hodgkin-Huxley-type model that is a stiff ODE system with two fast and one slow variables. For the parameter ranges under consideration the original version of the model has unstable fixed point and the oscillating attractor that demonstrates bifurcation from bursting to spiking dynamics. Also a modified version is considered where the bistability occurs such that an area in the parameter space appears where the fixed point becomes stable and coexists with the bursting attractor. For these two systems we create artificial neural networks that are able to reproduce their dynamics. The created networks operate as recurrent maps and are trained on trajectory cuts sampled at random parameter values within a certain range. Although the networks are trained only on oscillatory trajectory cuts, it also discover the fixed point of the considered systems. The position and even the eigenvalues coincide very well with the fixed point of the initial ODEs. For the bistable model it means that the network being trained only on one brunch of the solutions recovers another brunch without seeing it during the training. These results, as we see it, are able to trigger the development of new approaches to complex dynamics reconstruction and discovering. From the practical point of view reproducing dynamics with the neural network can be considered as a sort of alternative method of numerical modeling intended for use with contemporary parallel hard- and software.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Russia > Volga Federal District > Nizhny Novgorod Oblast > Nizhny Novgorod (0.04)
- Asia > Russia (0.04)