Computational Considerations in Correcting User-Language
Renner, Adam M. (University of Memphis) | McCarthy, Philip M. (University of Memphis) | McNamara, Danielle S. (University of Memphis)
This study evaluates the robustness of established computational indices used to assess text relatedness in user-language. The original User-Language Paraphrase Corpus (ULPC) was compared to a corrected version, in which each paraphrase was corrected for typographical and grammatical errors. Error correction significantly affected values for each of five computational indices, indicating greater similarity of the target sentence to the corrected paraphrase than to the original paraphrase. Moreover, misspelled target words accounted for a large proportion of the differences. This study also evaluated potential effects on correlations between computational indices and human ratings of paraphrases. The corrections did not yield assessments that were any more or less comparable to trained human raters than were the original paraphrases containing typographical or grammatical errors. The results suggest that although correcting for errors may optimize certain computational indices, the corrections are not necessary for comparing the indices to expert ratings.
May-21-2009
- Country:
- North America
- United States
- Texas > Travis County
- Austin (0.04)
- Tennessee > Shelby County
- Memphis (0.04)
- New Jersey > Bergen County
- Mahwah (0.04)
- California > San Mateo County
- Menlo Park (0.14)
- Texas > Travis County
- Canada > Quebec
- Montreal (0.04)
- United States
- Asia > Middle East
- Jordan (0.04)
- North America
- Genre:
- Research Report > New Finding (0.88)
- Industry:
- Technology: