Scalable Valuation of Human Feedback through Provably Robust Model Alignment

Open in new window