From Narrow Unlearning to Emergent Misalignment: Causes, Consequences, and Containment in LLMs

Open in new window