Assessing Evaluation Metrics for Neural Test Oracle Generation