Improving Topic Segmentation by Injecting Discourse Dependencies

Xing, Linzi, Huber, Patrick, Carenini, Giuseppe

Sep-18-2022–arXiv.org Artificial Intelligence

Recent neural supervised topic segmentation models achieve distinguished superior effectiveness over unsupervised methods, with the availability of large-scale training corpora sampled from Wikipedia. These models may, however, suffer from limited robustness and transferability caused by exploiting simple linguistic cues for prediction, but overlooking more important inter-sentential topical consistency. To address this issue, we present a discourseaware neural topic segmentation model with the injection of above-sentence discourse dependency structures to encourage the model make topic boundary prediction based more on the topical consistency between sentences. Our empirical study on English evaluation datasets shows that injecting above-sentence Figure 1: An example article about Cholinergic Urticaria discourse structures to a neural topic segmenter (CU) sampled from the en_disease portion of with our proposed strategy can substantially Wiki-Section dataset (Arnold et al., 2019). Left: discourse improve its performances on intradomain dependency structure predicted by the Sent-First and out-of-domain data, with little increase discourse parser (Zhou and Feng, 2022). of model's complexity.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Sep-18-2022

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
  - New South Wales > Sydney (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Maryland > Baltimore (0.04)
    - Pennsylvania (0.04)
    - Washington > King County
      - Seattle (0.04)
    - New Mexico > Santa Fe County
      - Santa Fe (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - Colorado > Boulder County
      - Boulder (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - Germany > Berlin (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
- Asia
  - China > Hong Kong (0.04)
  - Singapore (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)
  - Japan > Hokkaidō
    - Hokkaidō Prefecture > Sapporo (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language
    - Discourse & Dialogue (0.68)
    - Text Processing (0.68)
  - Machine Learning
    - Neural Networks (0.68)
    - Statistical Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found