Annotation Guidelines for Corpus Novelties: Part 1 -- Named Entity Recognition

Oct-4-2024–arXiv.org Artificial Intelligence

It was constituted mainly to fulfill two goals: in the short term, train and test NER methods able to handle long texts, and in the longer term, be used to develop Renard [3], a pipeline aiming at extracting character networks from literary fiction. This pipeline includes several processing steps after the NER, including coreference resolution and character unification. Character networks can be used to tackle a number of tasks, including the assessment of literary theories, the level of historicity of a narrative, detecting roles in stories, classifying novels, identify subplots, segment a storyline, summarize a story, design recommendation systems, align narratives, etc. See the detailed survey of Labatut and Bost [11] for more information regarding character networks. This context drives the elaboration of the corpus, which explains why it exhibits certain differences with many similar NER corpora, such as CoNLL-2003 [17] or OntoNotes v5 [20]. We originally based Novelties on the literary corpus from Dekker et al. [6] as we describe in Section A of the appendix. Note that there are other literary NER corpora (cf.

annotate, annotation guideline, expression, (11 more...)

arXiv.org Artificial Intelligence

Oct-4-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - Massachusetts (0.04)
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.04)
- Europe
  - United Kingdom > England (0.04)
  - France (0.04)
  - Netherlands
    - South Holland > The Hague (0.04)
    - North Holland > Amsterdam (0.04)
- Asia > Middle East
  - Republic of Türkiye > Batman Province > Batman (0.04)
- Africa > South Africa
  - Western Cape > Cape Town (0.04)

Genre:
- Research Report (0.40)

Industry:
- Leisure & Entertainment (1.00)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found