Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset