Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Open in new window