Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models