Scalable agent alignment via reward modeling: a research direction

Open in new window