Group-Aware Reinforcement Learning for Output Diversity in Large Language Models

Open in new window