Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models

Open in new window