Extending Group Relative Policy Optimization to Continuous Control: A Theoretical Framework for Robotic Reinforcement Learning

Khanda, Rajat, Baqar, Mohammad, Chakrabarti, Sambuddha, Changdar, Satyasaran

arXiv.org Artificial Intelligence 

Reinforcement Learning (RL) has achieved remarkable success across diverse domains, from game playing [1] to robotic control [2]. However, traditional policy optimization methods face significant challenges in continuous control settings, particularly in robotics where high-dimensional action spaces, sparse rewards, and sample inefficiency pose persistent obstacles [3]. Recent advances in policy optimization, such as Proximal Policy Optimization (PPO) [4] and Soft Actor-Critic (SAC) [5], have addressed key challenges through distinct techniques--PPO employs clipped surrogate objectives to ensure stable updates, while SAC leverages entropy regularization to encourage exploration and improve robustness. However, these methods rely heavily on value function approximation, which can introduce bias and instability, particularly in high-dimensional or sparse-reward environments common in robotics [6]. Group Relative Policy Optimization (GRPO) [7] presents an alternative approach by eliminating reliance on value function approximation through group-based advantage estimation. Initially developed for discrete action spaces, GRPO has demonstrated improved stability and sample efficiency in tasks such as language model alignment.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found