Goto

Collaborating Authors

 Advanced Geothermal System (AGS)


Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation

arXiv.org Artificial Intelligence

Despite significant progress in robotics and embodied AI in recent years, deploying robots for long-horizon tasks remains a great challenge. Majority of prior arts adhere to an open-loop philosophy and lack real-time feedback, leading to error accumulation and undesirable robustness. A handful of approaches have endeavored to establish feedback mechanisms leveraging pixel-level differences or pre-trained visual representations, yet their efficacy and adaptability have been found to be constrained. Inspired by classic closed-loop control systems, we propose CLOVER, a closed-loop visuomotor control framework that incorporates feedback mechanisms to improve adaptive robotic control. CLOVER consists of a text-conditioned video diffusion model for generating visual plans as reference inputs, a measurable embedding space for accurate error quantification, and a feedback-driven controller that refines actions from feedback and initiates replans as needed. Our framework exhibits notable advancement in real-world robotic tasks and achieves state-of-the-art on CALVIN benchmark, improving by 8% over previous open-loop counterparts. Code and checkpoints are maintained at https://github.com/OpenDriveLab/CLOVER.


Promptable Closed-loop Traffic Simulation

arXiv.org Artificial Intelligence

Simulation stands as a cornerstone for safe and efficient autonomous driving development. At its core a simulation system ought to produce realistic, reactive, and controllable traffic patterns. In this paper, we propose ProSim, a multimodal promptable closed-loop traffic simulation framework. ProSim allows the user to give a complex set of numerical, categorical or textual prompts to instruct each agent's behavior and intention. ProSim then rolls out a traffic scenario in a closed-loop manner, modeling each agent's interaction with other traffic participants. Our experiments show that ProSim achieves high prompt controllability given different user prompts, while reaching competitive performance on the Waymo Sim Agents Challenge when no prompt is given. To support research on promptable traffic simulation, we create ProSim-Instruct-520k, a multimodal prompt-scenario paired driving dataset with over 10M text prompts for over 520k real-world driving scenarios. We will release code of ProSim as well as data and labeling tools of ProSim-Instruct-520k at https://ariostgx.github.io/ProSim.


Closed-Loop Magnetic Control of Medical Soft Continuum Robots for Deflection

arXiv.org Artificial Intelligence

Magnetic soft continuum robots (MSCRs) have emerged as powerful devices in endovascular interventions owing to their hyperelastic fibre matrix and enhanced magnetic manipulability. Effective closed-loop control of tethered magnetic devices contributes to the achievement of autonomous vascular robotic surgery. In this article, we employ a magnetic actuation system equipped with a single rotatable permanent magnet to achieve closed-loop deflection control of the MSCR. To this end, we establish a differential kinematic model of MSCRs exposed to non-uniform magnetic fields. The relationship between the existence and uniqueness of Jacobian and the geometric position between robots is deduced. The control direction induced by Jacobian is demonstrated to be crucial in simulations. Then, the corresponding quasi-static control (QSC) framework integrates a linear extended state observer to estimate model uncertainties. Finally, the effectiveness of the proposed QSC framework is validated through comparative trajectory tracking experiments with the PD controller under external disturbances. Further extensions are made for the Jacobian to path-following control at the distal end position. The proposed control framework prevents the actuator from reaching the joint limit and achieves fast and low error-tracking performance without overshooting.


Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling

arXiv.org Artificial Intelligence

The increasing availability of human demonstrations has spurred renewed interest in behavioral cloning [1, 2]. In particular, recent studies have highlighted the potential of learning from large-scale demonstrations to acquire a variety of complex skills [3, 4, 5, 6, 7, 8]. However, this approach still struggles with two common properties of human demonstrations: (i) strong temporal dependencies across multiple steps, such as idle pauses [4] and latent strategies [9, 10], (ii) large style variability across different demonstrations, including differences in proficiency [11] and preference [12]. Oftentimes, both properties are prevalent yet unlabeled in collected data, posing significant challenges to traditional behavioral cloning, which typically learns a discriminative model to map an input state to a target action. In response to these challenges, recent works have pursued a generative approach characterized by two key elements: (i) predicting a sequence of actions over multiple time steps and executing all or part of the sequence, known as action chunking [3] or receding horizon [4]; (ii) modeling the distribution of action chunks and sampling from the learned model in an independent [4, 13] or weakly dependent [3, 14] manner during deployment. Some studies find these elements crucial for learning a performant policy in controlled laboratory scenarios [3, 4], while other recent work reports opposite outcomes under practical conditions [6]. The reasons behind these conflicting results remain unclear.


Closed-loop underwater soft robotic foil shape control using flexible e-skin

arXiv.org Artificial Intelligence

The use of soft robotics for real-world underwater applications is limited, even more than in terrestrial applications, by the ability to accurately measure and control the deformation of the soft materials in real time without the need for feedback from an external sensor. Real-time underwater shape estimation would allow for accurate closed-loop control of soft propulsors, enabling high-performance swimming and manoeuvring. We propose and demonstrate a method for closed-loop underwater soft robotic foil control based on a flexible capacitive e-skin and machine learning which does not necessitate feedback from an external sensor. The underwater e-skin is applied to a highly flexible foil undergoing deformations from 2% to 9% of its camber by means of soft hydraulic actuators. Accurate set point regulation of the camber is successfully tracked during sinusoidal and triangle actuation routines with an amplitude of 5% peak-to-peak and 10-second period with a normalised RMS error of 0.11, and 2% peak-to-peak amplitude with a period of 5 seconds with a normalised RMS error of 0.03. The tail tip deflection can be measured across a 30 mm (0.15 chords) range. These results pave the way for using e-skin technology for underwater soft robotic closed-loop control applications.


DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving

arXiv.org Artificial Intelligence

This paper presented DriveArena, the first high-fidelity closed-loop simulation system designed for driving agents navigating in real scenarios. DriveArena features a flexible, modular architecture, allowing for the seamless interchange of its core components: Traffic Manager, a traffic simulator capable of generating realistic traffic flow on any worldwide street map, and World Dreamer, a high-fidelity conditional generative model with infinite autoregression. This powerful synergy empowers any driving agent capable of processing real-world images to navigate in DriveArena's simulated environment. The agent perceives its surroundings through images generated by World Dreamer and output trajectories. These trajectories are fed into Traffic Manager, achieving realistic interactions with other vehicles and producing a new scene layout. Finally, the latest scene layout is relayed back into World Dreamer, perpetuating the simulation cycle. This iterative process fosters closed-loop exploration within a highly realistic environment, providing a valuable platform for developing and evaluating driving agents across diverse and challenging scenarios. DriveArena signifies a substantial leap forward in leveraging generative image data for the driving simulation platform, opening insights for closed-loop autonomous driving. Code will be available soon on GitHub: https://github.com/PJLab-ADG/DriveArena


Closed-loop Diffusion Control of Complex Physical Systems

arXiv.org Artificial Intelligence

The control problems of complex physical systems have wide applications in science and engineering. Several previous works have demonstrated that generative control methods based on diffusion models have significant advantages for solving these problems. However, existing generative control methods face challenges in handling closed-loop control, which is an inherent constraint for effective control of complex physical systems. In this paper, we propose a C losed-L oop Diff usion method for Phy sical systems Con trol (CL-DiffPhyCon). By adopting an asynchronous denoising schedule for different time steps, CL-DiffPhyCon generates control signals conditioned on real-time feedback from the environment. Thus, CL-DiffPhyCon is able to speed up diffusion control methods in a closed-loop framework. We evaluate CL-DiffPhyCon on the 1D Burgers' equation control and 2D incompressible fluid control tasks. The results demonstrate that CL-DiffPhyCon achieves notable control performance with significant sampling acceleration. The control problem of complex physical systems is a critical area of study that involves optimizing a sequence of control actions to achieve specific objectives. It has important applications across a wide range of science and engineering fields, including fluid control (V erma et al., 2018), plasma control (Degrave et al., 2022), and particle dynamics control (Reyes Garza et al., 2023). The challenge in controlling such systems arises from their high-dimensional, highly nonlinear, and stochastic characteristics. Therefore, to achieve effective performance, there is an inherent requirement of closed-loop control.


Rethinking Closed-loop Planning Framework for Imitation-based Model Integrating Prediction and Planning

arXiv.org Artificial Intelligence

In recent years, the integration of prediction and planning through neural networks has received substantial attention. Despite extensive studies on it, there is a noticeable gap in understanding the operation of such models within a closed-loop planning setting. To bridge this gap, we propose a novel closed-loop planning framework compatible with neural networks engaged in joint prediction and planning. The framework contains two running modes, namely planning and safety monitoring, wherein the neural network performs Motion Prediction and Planning (MPP) and Conditional Motion Prediction (CMP) correspondingly without altering architecture. We evaluate the efficacy of our framework using the nuPlan dataset and its simulator, conducting closed-loop experiments across diverse scenarios. The results demonstrate that the proposed framework ensures the feasibility and local stability of the planning process while maintaining safety with CMP safety monitoring. Compared to other learning-based methods, our approach achieves substantial improvement.


Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models

arXiv.org Artificial Intelligence

Object hallucination has been an Achilles' heel which hinders the broader applications of large vision-language models (LVLMs). Object hallucination refers to the phenomenon that the LVLMs claim non-existent objects in the image. To mitigate the object hallucinations, instruction tuning and external model-based detection methods have been proposed, which either require large-scare computational resources or depend on the detection result of external models. However, there remains an under-explored field to utilize the LVLM itself to alleviate object hallucinations. In this work, we adopt the intuition that the LVLM tends to respond logically consistently for existent objects but inconsistently for hallucinated objects. Therefore, we propose a Logical Closed Loop-based framework for Object Hallucination Detection and Mitigation, namely LogicCheckGPT. In specific, we devise logical consistency probing to raise questions with logical correlations, inquiring about attributes from objects and vice versa. Whether their responses can form a logical closed loop serves as an indicator of object hallucination. As a plug-and-play method, it can be seamlessly applied to all existing LVLMs. Comprehensive experiments conducted on three benchmarks across four LVLMs have demonstrated significant improvements brought by our method, indicating its effectiveness and generality.


Language-Driven Closed-Loop Grasping with Model-Predictive Trajectory Replanning

arXiv.org Artificial Intelligence

Combining a vision module inside a closed-loop control system for a \emph{seamless movement} of a robot in a manipulation task is challenging due to the inconsistent update rates between utilized modules. This task is even more difficult in a dynamic environment, e.g., objects are moving. This paper presents a \emph{modular} zero-shot framework for language-driven manipulation of (dynamic) objects through a closed-loop control system with real-time trajectory replanning and an online 6D object pose localization. We segment an object within $\SI{0.5}{\second}$ by leveraging a vision language model via language commands. Then, guided by natural language commands, a closed-loop system, including a unified pose estimation and tracking and online trajectory planning, is utilized to continuously track this object and compute the optimal trajectory in real-time. Our proposed zero-shot framework provides a smooth trajectory that avoids jerky movements and ensures the robot can grasp a non-stationary object. Experiment results exhibit the real-time capability of the proposed zero-shot modular framework for the trajectory optimization module to accurately and efficiently grasp moving objects, i.e., up to \SI{30}{\hertz} update rates for the online 6D pose localization module and \SI{10}{\hertz} update rates for the receding-horizon trajectory optimization. These advantages highlight the modular framework's potential applications in robotics and human-robot interaction; see the video in https://www.acin.tuwien.ac.at/en/6e64/.