ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs

Open in new window