Investigating Data Contamination for Pre-training Language Models