Under the Surface: Tracking the Artifactuality of LLM-Generated Data