Understanding Dataset Design Choices for Multi-hop Reasoning