Stress-Testing Capability Elicitation With Password-Locked Models