Shared Parameter Subspaces and Cross-Task Linearity in Emergently Misaligned Behavior

Open in new window