Evaluating Computational Representations of Character: An Austen Character Similarity Benchmark

Aug-28-2024–arXiv.org Artificial Intelligence

Several systems have been developed to extract information about characters to aid computational analysis of English literature. We propose character similarity grouping as a holistic evaluation task for these pipelines. We present AustenAlike, a benchmark suite of character similarities in Jane Austen's novels. Our benchmark draws on three notions of character similarity: a structurally defined notion of similarity; a socially defined notion of similarity; and an expert defined set extracted from literary criticism. We use AustenAlike to evaluate character features extracted using two pipelines, BookNLP and FanfictionNLP. We build character representations from four kinds of features and compare them to the three AustenAlike benchmarks and to GPT-4 similarity rankings. We find that though computational representations capture some broad similarities based on shared social and narrative roles, the expert pairings in our third benchmark are challenging for all systems, highlighting the subtler aspects of similarity noted by human readers.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Aug-28-2024

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America
  - United States
    - Maryland (0.04)
    - California (0.04)
    - Washington > King County
      - Seattle (0.04)
    - Texas > Travis County
      - Austin (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.15)
    - Massachusetts > Norfolk County
      - Wellesley (0.04)
    - Colorado > Denver County
      - Denver (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)
- Europe
  - Netherlands (0.04)
  - Sweden
    - Vaestra Goetaland > Gothenburg (0.04)
    - Uppsala County > Uppsala (0.04)
  - Spain > Valencian Community
    - Valencia Province > Valencia (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Italy
    - Tuscany > Florence (0.04)
    - Calabria > Catanzaro Province
      - Catanzaro (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
  - Bulgaria > Sofia City Province
    - Sofia (0.04)
- Asia
  - Singapore (0.04)
  - Middle East > Jordan (0.04)
  - Indonesia > Bali (0.04)
  - Japan > Honshū
    - Kansai > Osaka Prefecture
      - Osaka (0.04)
    - Chūbu > Aichi Prefecture
      - Nagoya (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Text Processing (0.97)
    - Large Language Model (0.69)
  - Machine Learning > Neural Networks
    - Deep Learning (0.36)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found