Modeling Human Beliefs about AI Behavior for Scalable Oversight

Open in new window