Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study