An Adversarial Example for Direct Logit Attribution: Memory Management in gelu-4l

Open in new window