Geothermal System for Power Generation
Closed-loop underwater soft robotic foil shape control using flexible e-skin
Micklem, Leo, Dong, Huazhi, Giorgio-Serchi, Francesco, Yang, Yunjie, Weymouth, Gabriel D., Thornton, Blair
The use of soft robotics for real-world underwater applications is limited, even more than in terrestrial applications, by the ability to accurately measure and control the deformation of the soft materials in real time without the need for feedback from an external sensor. Real-time underwater shape estimation would allow for accurate closed-loop control of soft propulsors, enabling high-performance swimming and manoeuvring. We propose and demonstrate a method for closed-loop underwater soft robotic foil control based on a flexible capacitive e-skin and machine learning which does not necessitate feedback from an external sensor. The underwater e-skin is applied to a highly flexible foil undergoing deformations from 2% to 9% of its camber by means of soft hydraulic actuators. Accurate set point regulation of the camber is successfully tracked during sinusoidal and triangle actuation routines with an amplitude of 5% peak-to-peak and 10-second period with a normalised RMS error of 0.11, and 2% peak-to-peak amplitude with a period of 5 seconds with a normalised RMS error of 0.03. The tail tip deflection can be measured across a 30 mm (0.15 chords) range. These results pave the way for using e-skin technology for underwater soft robotic closed-loop control applications.
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving
Yang, Xuemeng, Wen, Licheng, Ma, Yukai, Mei, Jianbiao, Li, Xin, Wei, Tiantian, Lei, Wenjie, Fu, Daocheng, Cai, Pinlong, Dou, Min, Shi, Botian, He, Liang, Liu, Yong, Qiao, Yu
This paper presented DriveArena, the first high-fidelity closed-loop simulation system designed for driving agents navigating in real scenarios. DriveArena features a flexible, modular architecture, allowing for the seamless interchange of its core components: Traffic Manager, a traffic simulator capable of generating realistic traffic flow on any worldwide street map, and World Dreamer, a high-fidelity conditional generative model with infinite autoregression. This powerful synergy empowers any driving agent capable of processing real-world images to navigate in DriveArena's simulated environment. The agent perceives its surroundings through images generated by World Dreamer and output trajectories. These trajectories are fed into Traffic Manager, achieving realistic interactions with other vehicles and producing a new scene layout. Finally, the latest scene layout is relayed back into World Dreamer, perpetuating the simulation cycle. This iterative process fosters closed-loop exploration within a highly realistic environment, providing a valuable platform for developing and evaluating driving agents across diverse and challenging scenarios. DriveArena signifies a substantial leap forward in leveraging generative image data for the driving simulation platform, opening insights for closed-loop autonomous driving. Code will be available soon on GitHub: https://github.com/PJLab-ADG/DriveArena
Closed-loop Diffusion Control of Complex Physical Systems
Wei, Long, Feng, Haodong, Hu, Peiyan, Zhang, Tao, Yang, Yuchen, Zheng, Xiang, Feng, Ruiqi, Fan, Dixia, Wu, Tailin
The control problems of complex physical systems have wide applications in science and engineering. Several previous works have demonstrated that generative control methods based on diffusion models have significant advantages for solving these problems. However, existing generative control methods face challenges in handling closed-loop control, which is an inherent constraint for effective control of complex physical systems. In this paper, we propose a C losed-L oop Diff usion method for Phy sical systems Con trol (CL-DiffPhyCon). By adopting an asynchronous denoising schedule for different time steps, CL-DiffPhyCon generates control signals conditioned on real-time feedback from the environment. Thus, CL-DiffPhyCon is able to speed up diffusion control methods in a closed-loop framework. We evaluate CL-DiffPhyCon on the 1D Burgers' equation control and 2D incompressible fluid control tasks. The results demonstrate that CL-DiffPhyCon achieves notable control performance with significant sampling acceleration. The control problem of complex physical systems is a critical area of study that involves optimizing a sequence of control actions to achieve specific objectives. It has important applications across a wide range of science and engineering fields, including fluid control (V erma et al., 2018), plasma control (Degrave et al., 2022), and particle dynamics control (Reyes Garza et al., 2023). The challenge in controlling such systems arises from their high-dimensional, highly nonlinear, and stochastic characteristics. Therefore, to achieve effective performance, there is an inherent requirement of closed-loop control.
Rethinking Closed-loop Planning Framework for Imitation-based Model Integrating Prediction and Planning
Guo, Jiayu, Feng, Mingyue, Zhu, Pengfei, Li, Chengjun, Pu, Jian
In recent years, the integration of prediction and planning through neural networks has received substantial attention. Despite extensive studies on it, there is a noticeable gap in understanding the operation of such models within a closed-loop planning setting. To bridge this gap, we propose a novel closed-loop planning framework compatible with neural networks engaged in joint prediction and planning. The framework contains two running modes, namely planning and safety monitoring, wherein the neural network performs Motion Prediction and Planning (MPP) and Conditional Motion Prediction (CMP) correspondingly without altering architecture. We evaluate the efficacy of our framework using the nuPlan dataset and its simulator, conducting closed-loop experiments across diverse scenarios. The results demonstrate that the proposed framework ensures the feasibility and local stability of the planning process while maintaining safety with CMP safety monitoring. Compared to other learning-based methods, our approach achieves substantial improvement.
Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models
Wu, Junfei, Liu, Qiang, Wang, Ding, Zhang, Jinghao, Wu, Shu, Wang, Liang, Tan, Tieniu
Object hallucination has been an Achilles' heel which hinders the broader applications of large vision-language models (LVLMs). Object hallucination refers to the phenomenon that the LVLMs claim non-existent objects in the image. To mitigate the object hallucinations, instruction tuning and external model-based detection methods have been proposed, which either require large-scare computational resources or depend on the detection result of external models. However, there remains an under-explored field to utilize the LVLM itself to alleviate object hallucinations. In this work, we adopt the intuition that the LVLM tends to respond logically consistently for existent objects but inconsistently for hallucinated objects. Therefore, we propose a Logical Closed Loop-based framework for Object Hallucination Detection and Mitigation, namely LogicCheckGPT. In specific, we devise logical consistency probing to raise questions with logical correlations, inquiring about attributes from objects and vice versa. Whether their responses can form a logical closed loop serves as an indicator of object hallucination. As a plug-and-play method, it can be seamlessly applied to all existing LVLMs. Comprehensive experiments conducted on three benchmarks across four LVLMs have demonstrated significant improvements brought by our method, indicating its effectiveness and generality.
Language-Driven Closed-Loop Grasping with Model-Predictive Trajectory Replanning
Nguyen, Huy Hoang, Vu, Minh Nhat, Beck, Florian, Ebmer, Gerald, Nguyen, Anh, Kugi, Andreas
Combining a vision module inside a closed-loop control system for a \emph{seamless movement} of a robot in a manipulation task is challenging due to the inconsistent update rates between utilized modules. This task is even more difficult in a dynamic environment, e.g., objects are moving. This paper presents a \emph{modular} zero-shot framework for language-driven manipulation of (dynamic) objects through a closed-loop control system with real-time trajectory replanning and an online 6D object pose localization. We segment an object within $\SI{0.5}{\second}$ by leveraging a vision language model via language commands. Then, guided by natural language commands, a closed-loop system, including a unified pose estimation and tracking and online trajectory planning, is utilized to continuously track this object and compute the optimal trajectory in real-time. Our proposed zero-shot framework provides a smooth trajectory that avoids jerky movements and ensures the robot can grasp a non-stationary object. Experiment results exhibit the real-time capability of the proposed zero-shot modular framework for the trajectory optimization module to accurately and efficiently grasp moving objects, i.e., up to \SI{30}{\hertz} update rates for the online 6D pose localization module and \SI{10}{\hertz} update rates for the receding-horizon trajectory optimization. These advantages highlight the modular framework's potential applications in robotics and human-robot interaction; see the video in https://www.acin.tuwien.ac.at/en/6e64/.
CarLLaVA: Vision language models for camera-only closed-loop driving
Renz, Katrin, Chen, Long, Marcu, Ana-Maria, Hรผnermann, Jan, Hanotte, Benoit, Karnsund, Alice, Shotton, Jamie, Arani, Elahe, Sinavski, Oleg
In this technical report, we present CarLLaVA, a Vision Language Model (VLM) for autonomous driving, developed for the CARLA Autonomous Driving Challenge 2.0. CarLLaVA uses the vision encoder of the LLaVA VLM and the LLaMA architecture as backbone, achieving state-of-the-art closed-loop driving performance with only camera input and without the need for complex or expensive labels. Additionally, we show preliminary results on predicting language commentary alongside the driving output. CarLLaVA uses a semi-disentangled output representation of both path predictions and waypoints, getting the advantages of the path for better lateral control and the waypoints for better longitudinal control. We propose an efficient training recipe to train on large driving datasets without wasting compute on easy, trivial data. CarLLaVA ranks 1st place in the sensor track of the CARLA Autonomous Driving Challenge 2.0 outperforming the previous state of the art by 458% and the best concurrent submission by 32.6%.
Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving
Jia, Xiaosong, Yang, Zhenjie, Li, Qifeng, Zhang, Zhiyuan, Yan, Junchi
In an era marked by the rapid scaling of foundation models, autonomous driving technologies are approaching a transformative threshold where end-to-end autonomous driving (E2E-AD) emerges due to its potential of scaling up in the data-driven manner. However, existing E2E-AD methods are mostly evaluated under the open-loop log-replay manner with L2 errors and collision rate as metrics (e.g., in nuScenes), which could not fully reflect the driving performance of algorithms as recently acknowledged in the community. For those E2E-AD methods evaluated under the closed-loop protocol, they are tested in fixed routes (e.g., Town05Long and Longest6 in CARLA) with the driving score as metrics, which is known for high variance due to the unsmoothed metric function and large randomness in the long route. Besides, these methods usually collect their own data for training, which makes algorithm-level fair comparison infeasible. To fulfill the paramount need of comprehensive, realistic, and fair testing environments for Full Self-Driving (FSD), we present Bench2Drive, the first benchmark for evaluating E2E-AD systems' multiple abilities in a closed-loop manner. Bench2Drive's official training data consists of 2 million fully annotated frames, collected from 10000 short clips uniformly distributed under 44 interactive scenarios (cut-in, overtaking, detour, etc), 23 weathers (sunny, foggy, rainy, etc), and 12 towns (urban, village, university, etc) in CARLA v2. Its evaluation protocol requires E2E-AD models to pass 44 interactive scenarios under different locations and weathers which sums up to 220 routes and thus provides a comprehensive and disentangled assessment about their driving capability under different situations. We implement state-of-the-art E2E-AD models and evaluate them in Bench2Drive, providing insights regarding current status and future directions.
PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning
Zheng, Yupeng, Xing, Zebin, Zhang, Qichao, Jin, Bu, Li, Pengfei, Zheng, Yuhang, Xia, Zhongpu, Zhan, Kun, Lang, Xianpeng, Chen, Yaran, Zhao, Dongbin
Vehicle motion planning is an essential component of autonomous driving technology. Current rule-based vehicle motion planning methods perform satisfactorily in common scenarios but struggle to generalize to long-tailed situations. Meanwhile, learning-based methods have yet to achieve superior performance over rule-based approaches in large-scale closed-loop scenarios. To address these issues, we propose PlanAgent, the first mid-to-mid planning system based on a Multi-modal Large Language Model (MLLM). MLLM is used as a cognitive agent to introduce human-like knowledge, interpretability, and common-sense reasoning into the closed-loop planning. Specifically, PlanAgent leverages the power of MLLM through three core modules. First, an Environment Transformation module constructs a Bird's Eye View (BEV) map and a lane-graph-based textual description from the environment as inputs. Second, a Reasoning Engine module introduces a hierarchical chain-of-thought from scene understanding to lateral and longitudinal motion instructions, culminating in planner code generation. Last, a Reflection module is integrated to simulate and evaluate the generated planner for reducing MLLM's uncertainty. PlanAgent is endowed with the common-sense reasoning and generalization capability of MLLM, which empowers it to effectively tackle both common and complex long-tailed scenarios. Our proposed PlanAgent is evaluated on the large-scale and challenging nuPlan benchmarks. A comprehensive set of experiments convincingly demonstrates that PlanAgent outperforms the existing state-of-the-art in the closed-loop motion planning task. Codes will be soon released.
POLICEd RL: Learning Closed-Loop Robot Control Policies with Provable Satisfaction of Hard Constraints
Bouvier, Jean-Baptiste, Nagpal, Kartik, Mehr, Negar
In this paper, we seek to learn a robot policy guaranteed to satisfy state constraints. To encourage constraint satisfaction, existing RL algorithms typically rely on Constrained Markov Decision Processes and discourage constraint violations through reward shaping. However, such soft constraints cannot offer verifiable safety guarantees. To address this gap, we propose POLICEd RL, a novel RL algorithm explicitly designed to enforce affine hard constraints in closed-loop with a black-box environment. Our key insight is to force the learned policy to be affine around the unsafe set and use this affine region as a repulsive buffer to prevent trajectories from violating the constraint. We prove that such policies exist and guarantee constraint satisfaction. Our proposed framework is applicable to both systems with continuous and discrete state and action spaces and is agnostic to the choice of the RL training algorithm. Our results demonstrate the capacity of POLICEd RL to enforce hard constraints in robotic tasks while significantly outperforming existing methods.