Representation Bending for Large Language Model Safety

Open in new window