Active clustering for labeling training data