On Corruption-Robustness in Performative Reinforcement Learning

Pollatos, Vasilis, Mandal, Debmalya, Radanovic, Goran

May-12-2025–arXiv.org Artificial Intelligence

In performative Reinforcement Learning (RL), an agent faces a policy-dependent environment: the reward and transition functions depend on the agent's policy. Prior work on performative RL has studied the convergence of repeated retraining approaches to a performatively stable policy. In the finite sample regime, these approaches repeatedly solve for a saddle point of a convex-concave objective, which estimates the Lagrangian of a regularized version of the reinforcement learning problem. In this paper, we aim to extend such repeated retraining approaches, enabling them to operate under corrupted data. More specifically, we consider Huber's $ε$-contamination model, where an $ε$ fraction of data points is corrupted by arbitrary adversarial noise. We propose a repeated retraining approach based on convex-concave optimization under corrupted gradients and a novel problem-specific robust mean estimator for the gradients. We prove that our approach exhibits last-iterate convergence to an approximately stable policy, with the approximation error linear in $\sqrtε$. We experimentally demonstrate the importance of accounting for corruption in performative RL.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

May-12-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.45)
- North America > United States (0.28)

Genre:
- Research Report (0.50)

Industry:
- Education (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (0.90)
  - Representation & Reasoning > Optimization (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found