A Formal Framework for Fluency-based Multi-Reference Evaluation in Grammatical Error Correction