Predictive Red Teaming: Breaking Policies Without Breaking Robots

Majumdar, Anirudha, Sharma, Mohit, Kalashnikov, Dmitry, Singh, Sumeet, Sermanet, Pierre, Sindhwani, Vikas

Feb-10-2025–arXiv.org Artificial Intelligence

Is it possible to expose the vulnerabilities of a given robot policy with respect to changes in environmental factors such as lighting, visual distractors, and object placement without performing hardware evaluations in these scenarios? As we seek to deploy robots in environments with ever-increasing complexity, it becomes imperative to develop scalable methods for predicting how well they will generalize when faced with unseen scenarios. Performing hardware evaluations to discover vulnerabilities -- which can depend in surprising ways on the specifics of policy training and architecture -- is often prohibitively expensive to set up and execute, especially when the goal is to test the limits of safe deployment in a sufficiently diverse set of scenarios. As an example, consider a visuomotor diffusion policy [1] trained to perform pick-and-place tasks via behavior cloning (Figure 1). The policy is trained with a large dataset: over 3K+ demonstrations with varied objects, locations, and visual distractors. Will the policy generalize well to a change in the height of the table by a few centimeters (as one may plausibly predict due to the variations in 2D object locations in the training dataset) compared to when a human is standing closer to the table than seen during training? If so, what is the absolute degradation of the success rate in each case? As it turns out, the above prediction is incorrect: the success rate of the policy degrades from 65% under nominal conditions to 10% by changing the table height, and remains roughly constant with a human close to the table. Predicting the relative and absolute impact of other factors (e.g., lighting, table backgrounds, object distractors; Figure 1) can be even more challenging.

large language model, machine learning, predictive red teaming, (16 more...)

arXiv.org Artificial Intelligence

Feb-10-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.46)
    - Neural Networks > Deep Learning (1.00)
  - Natural Language > Large Language Model (0.93)
  - Robots (1.00)