Labeling, transforming, and structuring training data sets for machine learning