LLMs Can Covertly Sandbag on Capability Evaluations Against Chain-of-Thought Monitoring