Optimizing Language Models for Human Preferences is a Causal Inference Problem

Open in new window