Improving Text Relationship Modeling with Artificial Data
Organisciak, Peter, Ryan, Maggie
–arXiv.org Artificial Intelligence
Identifying whole/part relationships between books in digital libraries can be a valuable tool for better understanding and cataloging the works found in bibliographic collections, irrespective of the form in which they were printed. However, this relationship is difficult to learn computationally because of limited ground truth availability. In this paper, we present an approach for data augmentation of whole/part training data through the use of artificially generated books. Artificial data is found to be a robust approach to training deep neural network classifiers on books with limited real ground truth, working to prevent over-fitting and improving classification by 91.0%. Modern cataloging standards support encoding complex work-level relationships, opening the possibility for bibliographic collections that better represent the complex ways that works are changed, iterated, and collated in library books.
arXiv.org Artificial Intelligence
Oct-27-2020
- Country:
- North America > United States (0.04)
- Oceania > Australia
- Europe
- Germany > Bavaria
- Upper Bavaria > Munich (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Germany > Bavaria
- Asia > China
- Hubei Province > Wuhan (0.04)
- Genre:
- Research Report (1.00)
- Industry:
- Education (1.00)
- Technology: