Aligning Machiavellian Agents: Behavior Steering via Test-Time Policy Shaping

Open in new window