Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors

Open in new window