uses the final accuracy of the SGD as a sanity check for the quality of models trained with AutoAssist (e.g.g, BLEU