Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference

Open in new window