Classification of cancer pathology reports: a large-scale comparative study

Martina, Stefano, Ventura, Leonardo, Frasconi, Paolo

Jun-29-2020–arXiv.org Machine Learning

We report about the application of state-of-the-art deep learning techniques to the automatic and interpretable assignment of ICD-O3 topography and morphology codes to free-text cancer reports. We present results on a large dataset (more than 80 000 labeled and 1 500 000 unlabeled anonymized reports written in Italian and collected from hospitals in Tuscany over more than a decade) and with a large number of classes (134 morphological classes and 61 topographical classes). We compare alternative architectures in terms of prediction accuracy and interpretability and show that our best model achieves a multiclass accuracy of 90.3% on topography site assignment and 84.8% on morphology type assignment. We found that in this context hierarchical models are not better than flat models and that an element-wise maximum aggregator is slightly better than attentive models on site classification. Moreover, the maximum aggregator offers a way to interpret the classification process.

deep learning, neural network, representation, (24 more...)

arXiv.org Machine Learning

Jun-29-2020

arXiv.org PDF

Add feedback

Country:
- Europe > Italy
  - Tuscany (0.24)
- North America > United States (0.93)

Genre:
- Research Report
  - Experimental Study (0.67)
  - New Finding (0.93)

Industry:
- Health & Medicine
  - Diagnostic Medicine (0.66)
  - Health Care Providers & Services (0.87)
  - Therapeutic Area > Oncology (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (1.00)
  - Statistical Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found