Summarising Historical Text in Modern Languages

Peng, Xutan, Zheng, Yi, Lin, Chenghua, Siddharthan, Advaith

Jan-26-2021–arXiv.org Artificial Intelligence

We introduce the task of historical text summarisation, where documents in historical forms of a language are summarised in the corresponding modern language. This is a fundamentally important routine to historians and digital humanities researchers but has never been automated. We compile a high-quality gold-standard text summarisation dataset, which consists of historical German and Chinese news from hundreds of years ago summarised in modern German or Chinese. Based on cross-lingual transfer learning techniques, we propose a summarisation model that can be trained even with no cross-lingual (historical to modern) parallel data, and further benchmark it against state-of-the-art algorithms. We report automatic and human evaluations that distinguish the historic to modern language summarisation task from standard cross-lingual summarisation (i.e., modern to modern language), highlight the distinctness and value of our dataset, and demonstrate that our transfer learning approach outperforms standard cross-lingual benchmarks on this task.

computational linguistic, proceedings, summarisation, (15 more...)

arXiv.org Artificial Intelligence

Jan-26-2021

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia (0.04)
- Caspian Sea (0.04)
- Atlantic Ocean > Black Sea (0.04)
- North America
  - United States
    - Maryland > Baltimore (0.04)
    - Texas > Travis County
      - Austin (0.04)
    - New Mexico > Santa Fe County
      - Santa Fe (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)
- Europe
  - Sweden (0.04)
  - Austria > Vienna (0.04)
  - Italy (0.04)
  - Russia (0.04)
  - France (0.04)
  - Holy See (0.04)
  - Hungary (0.04)
  - Germany
    - Brandenburg > Potsdam (0.04)
    - North Rhine-Westphalia > Upper Bavaria
      - Munich (0.04)
  - Spain
    - Valencian Community > Valencia Province
      - Valencia (0.04)
    - Catalonia > Barcelona Province
      - Barcelona (0.04)
  - United Kingdom > England
    - South Yorkshire > Sheffield (0.04)
  - Portugal > Lisbon
    - Lisbon (0.14)
  - Poland > Lower Silesia Province
    - Wroclaw (0.04)
- Asia
  - Russia (0.04)
  - Middle East
    - Republic of Türkiye (0.04)
    - Israel > Jerusalem District
      - Jerusalem (0.04)
  - China
    - Hong Kong (0.04)
    - Jiangsu Province > Nanjing (0.04)
- Africa > Middle East
  - Morocco (0.04)

Genre:
- Research Report (1.00)

Industry:
- Government (1.00)
- Media > News (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Processing (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found