Corrigibility as a Singular Target: A Vision for Inherently Reliable Foundation Models

Open in new window