Design and implementation of an open source Greek POS Tagger and Entity Recognizer using spaCy
Partalidou, Eleni, Spyromitros-Xioufis, Eleftherios, Doropoulos, Stavros, Vologiannidis, Stavros, Diamantaras, Konstantinos I.
This paper proposes a machine learning approach to part-of-speech tagging and named entity recognition for Greek, focusing on the extraction of morphological features and classification of tokens into a small set of classes for named entities. The architecture model that was used is introduced. The greek version of the spaCy platform was added into the source code, a feature that did not exist before our contribution, and was used for building the models. Additionally, a part of speech tagger was trained that can detect the morphology of the tokens and performs higher than the state-of-the-art results when classifying only the part of speech. For named entity recognition using spaCy, a model that extends the standard ENAMEX type (organization, location, person) was built. Certain experiments that were conducted indicate the need for flexibility in out-of-vocabulary words and there is an effort for resolving this issue. Finally, the evaluation results are discussed.
Dec-5-2019
- Country:
- Europe
- Greece > Central Macedonia
- Thessaloniki (0.05)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Sweden > Vaestra Goetaland
- Gothenburg (0.04)
- Greece > Central Macedonia
- North America
- Canada > British Columbia (0.04)
- United States (0.04)
- Europe
- Genre:
- Research Report (0.40)
- Technology: