Learning Without Critics? Revisiting GRPO in Classical Reinforcement Learning Environments

Open in new window