SemDeDup: Data-efficient learning at web-scale through semantic deduplication