CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks