Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models