Diagnosing Vision-and-Language Navigation: What Really Matters