AITopics | recovery policy

Collaborating Authors

recovery policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fare: Failure Resilience in Learned Visual Navigation Control

Wang, Zishuo, Loo, Joel, Hsu, David

arXiv.org Artificial IntelligenceOct-29-2025

Abstract-- While imitation learning (IL) enables effective visual navigation, IL policies are prone to unpredictable failures in out-of-distribution (OOD) scenarios. We advance the notion of failure-resilient policies, which not only detect failures but also recover from them automatically. F ailure recognition that identifies the factors causing failure is key to informing recovery: e.g. We present F are, a framework to construct failure-resilient IL policies, embedding OOD-detection and recognition in them without using explicit failure data, and pairing them with recovery heuristics. Real-world experiments show that F are enables failure recovery across two different policy architectures, enabling robust long-range navigation in complex environments. Visual navigation is an attractive approach to robot navigation, leveraging rich visual information from low-cost sensors [1]. Imitation learning (IL) has emerged as a key method to learn visual navigation policies [2]-[4], but is inherently limited by training data. IL policies may fail unpredictably on inputs outside the training distribution, often without clear explanation [5]-[7]. This work develops a mechanism to enable IL policies to detect and recover from failures, supporting robust open-world navigation.

artificial intelligence, machine learning, recognition, (18 more...)

arXiv.org Artificial Intelligence

2510.2468

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

From Refusal to Recovery: A Control-Theoretic Approach to Generative AI Guardrails

Pandya, Ravi, Bland, Madison, Nguyen, Duy P., Liu, Changliu, Fisac, Jaime Fernández, Bajcsy, Andrea

arXiv.org Artificial IntelligenceOct-16-2025

Generative AI systems are increasingly assisting and acting on behalf of end users in practical settings, from digital shopping assistants to next-generation autonomous cars. In this context, safety is no longer about blocking harmful content, but about preempting downstream hazards like financial or physical harm. Yet, most AI guardrails continue to rely on output classification based on labeled datasets and human-specified criteria,making them brittle to new hazardous situations. Even when unsafe conditions are flagged, this detection offers no path to recovery: typically, the AI system simply refuses to act--which is not always a safe choice. In this work, we argue that agentic AI safety is fundamentally a sequential decision problem: harmful outcomes arise from the AI system's continually evolving interactions and their downstream consequences on the world. We formalize this through the lens of safety-critical control theory, but within the AI model's latent representation of the world. This enables us to build predictive guardrails that (i) monitor an AI system's outputs (actions) in real time and (ii) proactively correct risky outputs to safe ones, all in a model-agnostic manner so the same guardrail can be wrapped around any AI model. We also offer a practical training recipe for computing such guardrails at scale via safety-critical reinforcement learning. Our experiments in simulated driving and e-commerce settings demonstrate that control-theoretic guardrails can reliably steer LLM agents clear of catastrophic outcomes (from collisions to bankruptcy) while preserving task performance, offering a principled dynamic alternative to today's flag-and-block guardrails.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.13727

Genre:

Instructional Material (0.48)
Research Report (0.40)

Industry:

Information Technology > Services > e-Commerce Services (0.54)
Transportation > Ground > Road (0.48)
Information Technology > Robotics & Automation (0.48)
Transportation > Passenger (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Failure Forecasting Boosts Robustness of Sim2Real Rhythmic Insertion Policies

Liu, Yuhan, Zhang, Xinyu, Chang, Haonan, Boularias, Abdeslam

arXiv.org Artificial IntelligenceJul-10-2025

This paper addresses the challenges of Rhythmic Insertion Tasks (RIT), where a robot must repeatedly perform high-precision insertions, such as screwing a nut into a bolt with a wrench. The inherent difficulty of RIT lies in achieving millimeter-level accuracy and maintaining consistent performance over multiple repetitions, particularly when factors like nut rotation and friction introduce additional complexity. We propose a sim-to-real framework that integrates a reinforcement learning-based insertion policy with a failure forecasting module. By representing the wrench's pose in the nut's coordinate frame rather than the robot's frame, our approach significantly enhances sim-to-real transferability. The insertion policy, trained in simulation, leverages real-time 6D pose tracking to execute precise alignment, insertion, and rotation maneuvers. Simultaneously, a neural network predicts potential execution failures, triggering a simple recovery mechanism that lifts the wrench and retries the insertion. Extensive experiments in both simulated and real-world environments demonstrate that our method not only achieves a high one-time success rate but also robustly maintains performance over long-horizon repetitive tasks.

artificial intelligence, insertion policy, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2507.06519

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

HWC-Loco: A Hierarchical Whole-Body Control Approach to Robust Humanoid Locomotion

Lin, Sixu, Qiao, Guanren, Tai, Yunxin, Li, Ang, Jia, Kui, Liu, Guiliang

arXiv.org Artificial IntelligenceMar-10-2025

Humanoid robots, capable of assuming human roles in various workplaces, have become essential to the advancement of embodied intelligence. However, as robots with complex physical structures, learning a control model that can operate robustly across diverse environments remains inherently challenging, particularly under the discrepancies between training and deployment environments. In this study, we propose HWC-Loco, a robust whole-body control algorithm tailored for humanoid locomotion tasks. By reformulating policy learning as a robust optimization problem, HWC-Loco explicitly learns to recover from safety-critical scenarios. While prioritizing safety guarantees, overly conservative behavior can compromise the robot's ability to complete the given tasks. To tackle this challenge, HWC-Loco leverages a hierarchical policy for robust control. This policy can dynamically resolve the trade-off between goal-tracking and safety recovery, guided by human behavior norms and dynamic constraints. To evaluate the performance of HWC-Loco, we conduct extensive comparisons against state-of-the-art humanoid control models, demonstrating HWC-Loco's superior performance across diverse terrains, robot structures, and locomotion tasks under both simulated and real-world environments.

humanoid robot, robot, submission and formatting instruction, (13 more...)

arXiv.org Artificial Intelligence

2503.00923

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report (0.84)

Industry: Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Out-of-Distribution Recovery with Object-Centric Keypoint Inverse Policy For Visuomotor Imitation Learning

Gao, George Jiayuan, Li, Tianyu, Figueroa, Nadia

arXiv.org Artificial IntelligenceNov-6-2024

We propose an object-centric recovery policy framework to address the challenges of out-of-distribution (OOD) scenarios in visuomotor policy learning. Previous behavior cloning (BC) methods rely heavily on a large amount of labeled data coverage, failing in unfamiliar spatial states. Without relying on extra data collection, our approach learns a recovery policy constructed by an inverse policy inferred from object keypoint manifold gradient in the original training data. The recovery policy serves as a simple add-on to any base visuomotor BC policy, agnostic to a specific method, guiding the system back towards the training distribution to ensure task success even in OOD situations. We demonstrate the effectiveness of our object-centric framework in both simulation and real robot experiments, achieving an improvement of 77.7% over the base policy in OOD. Project Website: https://sites.google.com/view/ocr-penn

arxiv preprint arxiv, base policy, scenario, (12 more...)

arXiv.org Artificial Intelligence

2411.03294

Country:

North America > United States > Pennsylvania (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback

RecoveryChaining: Learning Local Recovery Policies for Robust Manipulation

Vats, Shivam, Jha, Devesh K., Likhachev, Maxim, Kroemer, Oliver, Romeres, Diego

arXiv.org Artificial IntelligenceOct-17-2024

Model-based planners and controllers are commonly used to solve complex manipulation problems as they can efficiently optimize diverse objectives and generalize to long horizon tasks. However, they are limited by the fidelity of their model which oftentimes leads to failures during deployment. To enable a robot to recover from such failures, we propose to use hierarchical reinforcement learning to learn a separate recovery policy. The recovery policy is triggered when a failure is detected based on sensory observations and seeks to take the robot to a state from which it can complete the task using the nominal model-based controllers. Our approach, called RecoveryChaining, uses a hybrid action space, where the model-based controllers are provided as additional \emph{nominal} options which allows the recovery policy to decide how to recover, when to switch to a nominal controller and which controller to switch to even with \emph{sparse rewards}. We evaluate our approach in three multi-step manipulation tasks with sparse rewards, where it learns significantly more robust recovery policies than those learned by baselines. Finally, we successfully transfer recovery policies learned in simulation to a physical robot to demonstrate the feasibility of sim-to-real transfer with our method.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2410.13979

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > South Korea > Daegu > Daegu (0.04)

Genre: Research Report (0.82)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

CIMRL: Combining IMitation and Reinforcement Learning for Safe Autonomous Driving

Booher, Jonathan, Rohanimanesh, Khashayar, Xu, Junhong, Isenbaev, Vladislav, Balakrishna, Ashwin, Gupta, Ishan, Liu, Wei, Petiushko, Aleksandr

arXiv.org Artificial IntelligenceJun-26-2024

Modern approaches to autonomous driving rely heavily on learned components trained with large amounts of human driving data via imitation learning. However, these methods require large amounts of expensive data collection and even then face challenges with safely handling long-tail scenarios and compounding errors over time. At the same time, pure Reinforcement Learning (RL) methods can fail to learn performant policies in sparse, constrained, and challenging-to-define reward settings like driving. Both of these challenges make deploying purely cloned policies in safety critical applications like autonomous vehicles challenging. In this paper we propose Combining IMitation and Reinforcement Learning (CIMRL) approach - a framework that enables training driving policies in simulation through leveraging imitative motion priors and safety constraints. CIMRL does not require extensive reward specification and improves on the closed loop behavior of pure cloning methods. By combining RL and imitation, we demonstrate that our method achieves state-of-the-art results in closed loop simulation driving benchmarks.

action space, algorithm, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

2406.08878

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.66)

Industry:

Transportation > Ground > Road (0.71)
Information Technology > Robotics & Automation (0.71)
Automobiles & Trucks (0.71)
Energy (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

SRL-VIC: A Variable Stiffness-Based Safe Reinforcement Learning for Contact-Rich Robotic Tasks

Zhang, Heng, Solak, Gokhan, Lahr, Gustavo J. G., Ajoudani, Arash

arXiv.org Artificial IntelligenceJun-19-2024

Reinforcement learning (RL) has emerged as a promising paradigm in complex and continuous robotic tasks, however, safe exploration has been one of the main challenges, especially in contact-rich manipulation tasks in unstructured environments. Focusing on this issue, we propose SRL-VIC: a model-free safe RL framework combined with a variable impedance controller (VIC). Specifically, safety critic and recovery policy networks are pre-trained where safety critic evaluates the safety of the next action using a risk value before it is executed and the recovery policy suggests a corrective action if the risk value is high. Furthermore, the policies are updated online where the task policy not only achieves the task but also modulates the stiffness parameters to keep a safe and compliant profile. A set of experiments in contact-rich maze tasks demonstrate that our framework outperforms the baselines (without the recovery mechanism and without the VIC), yielding a good trade-off between efficient task accomplishment and safety guarantee. We show our policy trained on simulation can be deployed on a physical robot without fine-tuning, achieving successful task completion with robustness and generalization. The video is available at https://youtu.be/ksWXR3vByoQ.

contact-rich task, obstacle, robot, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LRA.2024.3396368

2406.13744

Country: Europe > Italy > Liguria > Genoa (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Multi-Constraint Safe RL with Objective Suppression for Safety-Critical Applications

Zhou, Zihan, Booher, Jonathan, Rohanimanesh, Khashayar, Liu, Wei, Petiushko, Aleksandr, Garg, Animesh

arXiv.org Artificial IntelligenceApr-15-2024

Safe reinforcement learning tasks with multiple constraints are a challenging domain despite being very common in the real world. In safety-critical domains, properly handling the constraints becomes even more important. To address this challenge, we first describe the multi-constraint problem with a stronger Uniformly Constrained MDP (UCMDP) model; we then propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic, as a solution to the Lagrangian dual of a UCMDP. We benchmark Objective Suppression in two multi-constraint safety domains, including an autonomous driving domain where any incorrect behavior can lead to disastrous consequences. Empirically, we demonstrate that our proposed method, when combined with existing safe RL algorithms, can match the task reward achieved by our baselines with significantly fewer constraint violations.

constraint, objective, objective suppression, (13 more...)

arXiv.org Artificial Intelligence

2402.1565

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Add feedback

Offline Goal-Conditioned Reinforcement Learning for Safety-Critical Tasks with Recovery Policy

Cao, Chenyang, Yan, Zichen, Lu, Renhao, Tan, Junbo, Wang, Xueqian

arXiv.org Artificial IntelligenceMar-4-2024

Offline goal-conditioned reinforcement learning (GCRL) aims at solving goal-reaching tasks with sparse rewards from an offline dataset. While prior work has demonstrated various approaches for agents to learn near-optimal policies, these methods encounter limitations when dealing with diverse constraints in complex environments, such as safety constraints. Some of these approaches prioritize goal attainment without considering safety, while others excessively focus on safety at the expense of training efficiency. In this paper, we study the problem of constrained offline GCRL and propose a new method called Recovery-based Supervised Learning (RbSL) to accomplish safety-critical tasks with various goals. To evaluate the method performance, we build a benchmark based on the robot-fetching environment with a randomly positioned obstacle and use expert or random policies to generate an offline dataset. We compare RbSL with three offline GCRL algorithms and one offline safe RL algorithm. As a result, our method outperforms the existing state-of-the-art methods to a large extent. Furthermore, we validate the practicality and effectiveness of RbSL by deploying it on a real Panda manipulator. Code is available at https://github.com/Sunlighted/RbSL.git.

constraint, learning, recovery policy, (13 more...)

arXiv.org Artificial Intelligence

2403.01734

Country: Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback