Towards Understanding Safety Alignment: A Mechanistic Perspective from Safety Neurons

Open in new window