Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph Level