Annotation Guidelines for Corpus Novelties: Part 1 -- Named Entity Recognition

Amalvy, Arthur, Labatut, Vincent

arXiv.org Artificial Intelligence 

It was constituted mainly to fulfill two goals: in the short term, train and test NER methods able to handle long texts, and in the longer term, be used to develop Renard [3], a pipeline aiming at extracting character networks from literary fiction. This pipeline includes several processing steps after the NER, including coreference resolution and character unification. Character networks can be used to tackle a number of tasks, including the assessment of literary theories, the level of historicity of a narrative, detecting roles in stories, classifying novels, identify subplots, segment a storyline, summarize a story, design recommendation systems, align narratives, etc. See the detailed survey of Labatut and Bost [11] for more information regarding character networks. This context drives the elaboration of the corpus, which explains why it exhibits certain differences with many similar NER corpora, such as CoNLL-2003 [17] or OntoNotes v5 [20]. We originally based Novelties on the literary corpus from Dekker et al. [6] as we describe in Section A of the appendix. Note that there are other literary NER corpora (cf.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found