Learning from Response not Preference: A Stackelberg Approach for LLM Detoxification using Non-parallel Data

Open in new window