Depth-Wise Activation Steering for Honest Language Models

Open in new window