Post-Convergence Sim-to-Real Policy Transfer: A Principled Alternative to Cherry-Picking

Apr-23-2025–arXiv.org Artificial Intelligence

Post-Convergence Sim-to-Real Policy Transfer: A Principled Alternative to Cherry-Picking Dylan Khor 1 and Bowen Weng 1 Abstract -- Learning-based approaches, particularly reinforcement learning (RL), have become widely used for developing control policies for autonomous agents, such as locomotion policies for legged robots. Starting from a randomly initialized policy, the empirical expected reward follows a trajectory with an overall increasing trend. While some policies become temporarily stuck in local optima, a well-defined training process generally converges to a reward level with noisy oscillations. However, selecting a policy for real-world deployment is rarely an analytical decision (i.e., simply choosing the one with the highest reward) and is instead often performed through trial and error . T o improve sim-to-real transfer, most research focuses on the pre-convergence stage, employing techniques such as domain randomization, multi-fidelity training, adversarial training, and architectural innovations. However, these methods do not eliminate the inevitable convergence trajectory and noisy oscillations of rewards, leading to heuristic policy selection or cherry-picking. This paper addresses the post-convergence sim-to-real transfer problem by introducing a worst-case performance transference optimization approach, formulated as a convex quadratic-constrained linear programming problem. Extensive experiments demonstrate its effectiveness in transferring RL-based locomotion policies from simulation to real-world laboratory tests. I. INTRODUCTION Figure 1 (b) illustrates the average reward trajectory from training a locomotion policy for the Unitree G1 humanoid robot in Isaac Gym using reinforcement learning (RL) [1] with the random seed being 50. Initially, the randomly initialized policy yields a low training reward.

artificial intelligence, machine learning, real-world performance, (19 more...)

arXiv.org Artificial Intelligence

Apr-23-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.28)
- North America > United States (0.46)

Genre:
- Research Report (1.00)

Industry:
- Transportation (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Representation & Reasoning > Optimization (1.00)
  - Robots (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found