Aligning Large Language Models with Human Preferences through Representation Engineering

Open in new window