#hardtoparse: POS Tagging and Parsing the Twitterverse
Foster, Jennifer (Dublin City University) | Cetinoglu, Ozlem (Dublin City University) | Wagner, Joachim (Dublin City University) | Roux, Joseph Le (LIF - CNRS) | Hogan, Stephen (Dublin City University) | Nivre, Joakim (Uppsala University) | Hogan, Deirdre (Dublin City University) | Genabith, Josef van (Dublin City University)
We evaluate the statistical dependency parser, Malt, on a new dataset of sentences taken from tweets. We use a version of Malt which is trained on gold standard phrase structure Wall Street Journal (WSJ) trees converted to Stanford labelled dependencies. We observe a drastic drop in performance moving from our in-domain WSJ test set to the new Twitter dataset, much of which has to do with the propagation of part-of-speech tagging errors. Retraining Malt on dependency trees produced by a state-of-the-art phrase structure parser, which has itself been self-trained on Twitter material, results in a significant improvement. We analyse this improvement by examining in detail the effect of the retraining on individual dependency types.
Aug-8-2011
- Country:
- Europe
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Ireland (0.04)
- Sweden > Uppsala County
- Uppsala (0.04)
- France > Provence-Alpes-Côte d'Azur
- North America > United States
- Pennsylvania (0.04)
- Europe
- Industry:
- Information Technology > Services (0.47)
- Media > News (0.34)
- Technology: