DataMan: Data Manager for Pre-training Large Language Models