Sketching Datasets for Large-Scale Learning (long version)