Reward Modeling for Mitigating Toxicity in Transformer-based Language Models

Open in new window