#hardtoparse: POS Tagging and Parsing the Twitterverse

Foster, Jennifer (Dublin City University) | Cetinoglu, Ozlem (Dublin City University) | Wagner, Joachim (Dublin City University) | Roux, Joseph Le (LIF - CNRS) | Hogan, Stephen (Dublin City University) | Nivre, Joakim (Uppsala University) | Hogan, Deirdre (Dublin City University) | Genabith, Josef van (Dublin City University)

AAAI Conferences 

We evaluate the statistical dependency parser, Malt, on a new dataset of sentences taken from tweets. We use a version of Malt which is trained on gold standard phrase structure Wall Street Journal (WSJ) trees converted to Stanford labelled dependencies. We observe a drastic drop in performance moving from our in-domain WSJ test set to the new Twitter dataset, much of which has to do with the propagation of part-of-speech tagging errors. Retraining Malt on dependency trees produced by a state-of-the-art phrase structure parser, which has itself been self-trained on Twitter material, results in a significant improvement. We analyse this improvement by examining in detail the effect of the retraining on individual dependency types.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found