Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training