Detoxification of Large Language Models through Output-layer Fusion with a Calibration Model

Open in new window