MaiBaam Annotation Guidelines
Blaschke, Verena, Kovačić, Barbara, Peng, Siyao, Plank, Barbara
–arXiv.org Artificial Intelligence
This document provides annotation guidelines for MaiBaam, a Bavarian corpus annotated with part-of-speech (POS) tags and syntactic dependencies. MaiBaam belongs to the Universal Dependencies (UD) project (Zeman et al., 2023; de Marneffe et al., 2021), and our annotations elaborate on the general and German UD version 2 guidelines. This document is structured broadly in the order we prepare and annotate sentences: first, preprocessing and tokenization ( 1), then general recaps of POS tags ( 2) and dependencies ( 3), before we go into annotation decisions that would also apply to German ( 4) and lastly decisions that are specific to Bavarian grammar ( 5). Many examples are written in German, since the standardized orthography makes it easier to search this PDF. We only annotate UD-style POS tags (UPOS tags) and dependencies and add the SpaceAfter=No feature where appropriate, but do not add any other information (no lemma, XPOS tags, morphological features, enhanced dependencies or miscellaneous annotations). This document is primarily directed at present and future annotators of MaiBaam. We publish it to additionally allow others working with MaiBaam or annotating similar data to better understand the decisions we have made.
arXiv.org Artificial Intelligence
Mar-9-2024