Do Natural Language Descriptions of Model Activations Convey Privileged Information?

Open in new window