The Alignment Auditor: A Bayesian Framework for Verifying and Refining LLM Objectives

Open in new window