MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning