Home
Following the recent adoption by the machine translation community of automatic evaluation using the BLEU/NIST scoring process, we conduct an in-depth study of a similar idea for evaluating summaries. The results show that automatic evaluation using unigram co-occurrences, i.e. ROUGE, between summary pairs correlates surprising well with human evaluations, based on various statistical metrics; while direct application of the BLEU evaluation procedure does not always give good results. For the inception of ROUGE, please read Lin & Hovy's HLT-NAACL 2003 (Lin and Hovy 2003) paper. For more details, please read Lin's paper "ROUGE: a Package for Automatic Evaluation of Summaries" (Lin 2004a).
Jan-18-2017, 10:21:44 GMT
- Technology: