Generative and reproducible benchmarks for comprehensive evaluation of machine learning classifiers