Unsupervised Scientific Abstract Segmentation with Normalized Mutual Information

Gao, Yingqiang, Lam, Jessica, Gu, Nianlong, Hahnloser, Richard H. R.

May-19-2023–arXiv.org Artificial Intelligence

The abstracts of scientific papers consist of premises and conclusions. Structured abstracts explicitly highlight the conclusion sentences, whereas non-structured abstracts may have conclusion sentences at uncertain positions. This implicit nature of conclusion positions makes the automatic segmentation of scientific abstracts into premises and conclusions a challenging task. In this work, we empirically explore using Normalized Mutual Information (NMI) for abstract segmentation. We consider each abstract as a recurrent cycle of sentences and place segmentation boundaries by greedily optimizing the NMI score between premises and conclusions. On non-structured abstracts, our proposed unsupervised approach GreedyCAS achieves the best performance across all evaluation metrics; on structured abstracts, GreedyCAS outperforms all baseline methods measured by $P_k$. The strong correlation of NMI to our evaluation metrics reveals the effectiveness of NMI for abstract segmentation.

machine learning, natural language, segmentation, (17 more...)

arXiv.org Artificial Intelligence

May-19-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Switzerland > Zürich > Zürich (0.14)

Genre:
- Research Report
  - Experimental Study (0.46)
  - New Finding (0.46)

Industry:
- Health & Medicine > Therapeutic Area
  - Immunology (1.00)
  - Infections and Infectious Diseases (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning (0.68)
  - Natural Language (1.00)
  - Representation & Reasoning > Search (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found