The Effect of Masking Strategies on Knowledge Retention by Language Models