Toward Preference-aligned Large Language Models via Residual-based Model Steering

Open in new window