Can Language Models Explain Their Own Classification Behavior?

Open in new window