Compact Proofs of Model Performance via Mechanistic Interpretability

Open in new window