Post-Convergence Sim-to-Real Policy Transfer: A Principled Alternative to Cherry-Picking
–arXiv.org Artificial Intelligence
Post-Convergence Sim-to-Real Policy Transfer: A Principled Alternative to Cherry-Picking Dylan Khor 1 and Bowen Weng 1 Abstract -- Learning-based approaches, particularly reinforcement learning (RL), have become widely used for developing control policies for autonomous agents, such as locomotion policies for legged robots. Starting from a randomly initialized policy, the empirical expected reward follows a trajectory with an overall increasing trend. While some policies become temporarily stuck in local optima, a well-defined training process generally converges to a reward level with noisy oscillations. However, selecting a policy for real-world deployment is rarely an analytical decision (i.e., simply choosing the one with the highest reward) and is instead often performed through trial and error . T o improve sim-to-real transfer, most research focuses on the pre-convergence stage, employing techniques such as domain randomization, multi-fidelity training, adversarial training, and architectural innovations. However, these methods do not eliminate the inevitable convergence trajectory and noisy oscillations of rewards, leading to heuristic policy selection or cherry-picking. This paper addresses the post-convergence sim-to-real transfer problem by introducing a worst-case performance transference optimization approach, formulated as a convex quadratic-constrained linear programming problem. Extensive experiments demonstrate its effectiveness in transferring RL-based locomotion policies from simulation to real-world laboratory tests. I. INTRODUCTION Figure 1 (b) illustrates the average reward trajectory from training a locomotion policy for the Unitree G1 humanoid robot in Isaac Gym using reinforcement learning (RL) [1] with the random seed being 50. Initially, the randomly initialized policy yields a low training reward.
arXiv.org Artificial Intelligence
Apr-23-2025
- Country:
- Europe
- Netherlands > South Holland
- Delft (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Netherlands > South Holland
- North America > United States
- Iowa (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Europe
- Genre:
- Research Report (1.00)
- Industry:
- Transportation (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Representation & Reasoning > Optimization (1.00)
- Robots (1.00)
- Information Technology > Artificial Intelligence