Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning
Kestemont, Mike, De Gussem, Jeroen
Especially in the community of Digital Humanities, the automated processing of Latin texts has always been a popular research topic. In a variety of computational applications, such as text reuse detection [Franzini et al, 2015], it is desirable to annotate and augment Latin texts with useful morpho-syntactical or lexical information, such as lemmas. In this paper, we will focus on two sequence tagging tasks for medieval Latin: part-of-speech tagging and lemmatization. Given a piece of Latin text, the task of lemmatization involves assigning each word to a single dictionary headword or'lemma': a baseform label (preferably in a normalized orthography) grouping all word tokens which only differ in spelling and/or inflection [Knowles et al, 2004]. The task of lemmatization is closely related to that of part-of-speech (PoS) tagging [Jurafsky et al, 2000], in which each word in a running text should be assigned a tag indicating its part of speech or word class (e.g.
Aug-3-2017
- Country:
- Europe
- Belgium > Flanders
- Antwerp Province > Antwerp (0.04)
- Czechia > Prague (0.04)
- Italy > Emilia-Romagna
- Metropolitan City of Bologna > Bologna (0.04)
- Switzerland (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Belgium > Flanders
- North America > United States
- District of Columbia > Washington (0.04)
- Europe
- Genre:
- Research Report > New Finding (0.46)
- Technology: