Supplementary Information A Collecting Internet Data A.1 Initial Unclean Dataset Curation