Balancing Quality and Variation: Spam Filtering Distorts Data Label Distributions