nlf
Neural Lyapunov Function Approximation with Self-Supervised Reinforcement Learning
McCutcheon, Luc, Gharesifard, Bahman, Fallah, Saber
Control Lyapunov functions are traditionally used to design a controller which ensures convergence to a desired state, yet deriving these functions for nonlinear systems remains a complex challenge. This paper presents a novel, sample-efficient method for neural approximation of nonlinear Lyapunov functions, leveraging self-supervised Reinforcement Learning (RL) to enhance training data generation, particularly for inaccurately represented regions of the state space. The proposed approach employs a data-driven World Model to train Lyapunov functions from off-policy trajectories. The method is validated on both standard and goal-conditioned robotic tasks, demonstrating faster convergence and higher approximation accuracy compared to the state-of-the-art neural Lyapunov approximation baseline. The code is available at: https://github.com/CAV-Research-Lab/SACLA.git
Meta-Learning-Based Adaptive Stability Certificates for Dynamical Systems
Jena, Amit, Kalathil, Dileep, Xie, Le
The trained NLF can then be used for the stability Stability assessment of non-linear systems and ensuring their estimation of the real-world system. However, this approach safe and reliable operation are of paramount importance in will fail if the real-world system dynamics is different from any real-world engineering system. While learning-based the model used for training the NLF. At the same time, the control schemes have received a lot of attention recently, real-world system model can be different from the model estimated the lack of stability guarantees is a fundamental issue that from the collected data due to various reasons, such as prevents their wide-scale deployment in the real world. One estimation error and changes in the system parameters over standard approach to estimate the stability region of a general time. Repeating the training procedure every time whenever nonlinear system is to first find a Lyapunov function for the there is such a parametric mismatch turns impractical due system and characterize its region of attraction (ROA) as the to the unavailability of necessary data samples and the need stability region (Khalil 2015). A closed-loop system is stable to get a quick stability assessment. Thus, learning a neural in the sense of Lyapunov if the system trajectory converges Lyapunov function for a real-world system using only a small to the origin as long as the initial condition is inside the number of data samples and through a few gradient updates, ROA. The sum-of-squares approach is one popular method remains an open problem.
DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
Chen, Yangyi, Sikka, Karan, Cogswell, Michael, Ji, Heng, Divakaran, Ajay
We present DRESS, a large vision language model (LVLM) that innovatively exploits Natural Language feedback (NLF) from Large Language Models to enhance its alignment and interactions by addressing two key limitations in the state-of-the-art LVLMs. First, prior LVLMs generally rely only on the instruction finetuning stage to enhance alignment with human preferences. Without incorporating extra feedback, they are still prone to generate unhelpful, hallucinated, or harmful responses. Second, while the visual instruction tuning data is generally structured in a multi-turn dialogue format, the connections and dependencies among consecutive conversational turns are weak. This reduces the capacity for effective multi-turn interactions. To tackle these, we propose a novel categorization of the NLF into two key types: critique and refinement. The critique NLF identifies the strengths and weaknesses of the responses and is used to align the LVLMs with human preferences. The refinement NLF offers concrete suggestions for improvement and is adopted to improve the interaction ability of the LVLMs-- which focuses on LVLMs' ability to refine responses by incorporating feedback in multi-turn interactions. To address the non-differentiable nature of NLF, we generalize conditional reinforcement learning for training. Our experimental results demonstrate that DRESS can generate more helpful (9.76%), honest (11.52%), and harmless (21.03%) responses, and more effectively learn from feedback during multi-turn interactions compared to SOTA LVMLs.