How to Select Datapoints for Efficient Human Evaluation of NLG Models?