Interpretability as Alignment: Making Internal Understanding a Design Principle

Open in new window