Optimizing Safe and Aligned Language Generation: A Multi-Objective GRPO Approach

Open in new window