Scalable Influence and Fact Tracing for Large Language Model Pretraining