Using Supervised Learning to Classify Metadata of Research Data by Discipline of Research

Weber, Tobias, Kranzlmüller, Dieter, Fromm, Michael, de Sousa, Nelson Tavares

Oct-16-2019–arXiv.org Machine Learning

Automated classification of metadata of research data by their discipline(s) of research can be used in scientometric research, by repository service providers, and in the context of research data aggregation services. Openly available metadata of the DataCite index for research data were used to compile a large training and evaluation set comprised of 609,524 records, which is published alongside this paper. These data allow to reproducibly assess classification approaches, such as tree-based models and neural networks. According to our experiments with 20 base classes (multi-label classification), multi-layer perceptron models perform best with a f1-macro score of 0.760 closely followed by Long Short-Term Memory models (f1-macro score of 0.755). A possible application of the trained classification models is the quantitative analysis of trends towards interdisciplinarity of digital scholarly output or the characterization of growth patterns of research data, stratified by discipline of research. Both applications perform at scale with the proposed models which are available for re-use.

classification scheme, metadata, research data, (13 more...)

arXiv.org Machine Learning

Oct-16-2019

arXiv.org PDF

Add feedback

Country:
- Oceania > New Zealand (0.04)
- North America > United States
  - Oregon (0.04)
  - California > San Diego County
    - San Diego (0.04)
- Europe
  - Portugal > Porto
    - Porto (0.04)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Performance Analysis > Accuracy (0.47)
  - Neural Networks
    - Deep Learning (0.68)
    - Perceptrons (0.54)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found