Gain Tuning Is Not What You Need: Reward Gain Adaptation for Constrained Locomotion Learning

Srisuchinnawong, Arthicha, Manoonpong, Poramate

Oct-14-2025–arXiv.org Artificial Intelligence

Figure 1: (a) Parameter trajectories from (white) RL, (black) constrained RL, and (brown) ROGER on a simulated reward landscape with their (transparent) explorations. Brighter regions indicate higher rewards, while darker regions indicate lower rewards. The red areas highlight violations with a red dashed line indicating the constraint threshold. RL and constrained RL consistently violate constraints, possibly during exploration, while ROGER effectively avoids violations. A video of this experiment is available at https://youtu.be/Cqu7vL T Piw?si=jtzJCpRubbFHx06w. Abstract--Existing robot locomotion learning techniques rely heavily on the offline selection of proper reward weighting gains and cannot guarantee constraint satisfaction (i.e., constraint violation) during training. Thus, this work aims to address both issues by proposing Reward-Oriented Gains via Embodied Regulation (ROGER), which adapts reward-weighting gains online based on penalties received throughout the embodied interaction process. The ratio between the positive reward (primary reward) and negative reward (penalty) gains is automatically reduced as the learning approaches the constraint thresholds to avoid violation. Conversely, the ratio is increased when learning is in safe states to prioritize performance. With a 60-kg quadruped robot, ROGER achieved near-zero constraint violation throughout multiple learning trials. It also achieved up to 50% more primary reward than the equivalent state-of-the-art techniques. In MuJoCo continuous locomotion benchmarks, including a single-leg hopper, ROGER exhibited comparable or up to 100% higher performance and 60% less torque usage and orientation deviation compared to those trained with the default reward function.

artificial intelligence, machine learning, roger, (15 more...)

arXiv.org Artificial Intelligence

Oct-14-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Thailand
  - Rayong > Rayong (0.04)
- Europe > Denmark
  - Southern Denmark (0.04)
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre:
- Research Report
  - Experimental Study (0.32)
  - New Finding (0.30)
  - Promising Solution (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Representation & Reasoning (1.00)
  - Robots > Locomotion (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found