Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control

Ayabe, Shingo, Kera, Hiroshi, Kawamoto, Kazuhiko

Oct-16-2025–arXiv.org Artificial Intelligence

Offline reinforcement learning enables sample-efficient policy acquisition without risky online interaction, yet policies trained on static datasets remain brittle under action-space perturbations such as actuator faults. This study introduces an offline-to-online framework that trains policies on clean data and then performs adversarial fine-tuning, where perturbations are injected into executed actions to induce compensatory behavior and improve resilience. A performance-aware curriculum further adjusts the perturbation probability during training via an exponential-moving-average signal, balancing robustness and stability throughout the learning process. Experiments on continuous-control locomotion tasks demonstrate that the proposed method consistently improves robustness over offline-only baselines and converges faster than training from scratch. Matching the fine-tuning and evaluation conditions yields the strongest robustness to action-space perturbations, while the adaptive curriculum strategy mitigates the degradation of nominal performance observed with the linear curriculum strategy. Overall, the results show that adversarial fine-tuning enables adaptive and robust control under uncertain environments, bridging the gap between offline efficiency and online adaptability.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

Oct-16-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Japan (0.14)

Genre:
- Research Report > New Finding (0.48)

Industry:
- Education > Educational Setting (0.48)
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found