MaiBaam Annotation Guidelines

Blaschke, Verena, Kovačić, Barbara, Peng, Siyao, Plank, Barbara

arXiv.org Artificial Intelligence 

This document provides annotation guidelines for MaiBaam, a Bavarian corpus annotated with part-of-speech (POS) tags and syntactic dependencies. MaiBaam belongs to the Universal Dependencies (UD) project (Zeman et al., 2023; de Marneffe et al., 2021), and our annotations elaborate on the general and German UD version 2 guidelines. This document is structured broadly in the order we prepare and annotate sentences: first, preprocessing and tokenization ( 1), then general recaps of POS tags ( 2) and dependencies ( 3), before we go into annotation decisions that would also apply to German ( 4) and lastly decisions that are specific to Bavarian grammar ( 5). Many examples are written in German, since the standardized orthography makes it easier to search this PDF. We only annotate UD-style POS tags (UPOS tags) and dependencies and add the SpaceAfter=No feature where appropriate, but do not add any other information (no lemma, XPOS tags, morphological features, enhanced dependencies or miscellaneous annotations). This document is primarily directed at present and future annotators of MaiBaam. We publish it to additionally allow others working with MaiBaam or annotating similar data to better understand the decisions we have made.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found