Influence Scores at Scale for Efficient Language Data Sampling