Masking in Multi-hop QA: An Analysis of How Language Models Perform with Context Permutation