Improving Large-Scale k-Nearest Neighbor Text Categorization with Label Autoencoders
Ribadas-Pena, Francisco J., Cao, Shuyuan, Bilbao, Víctor M. Darriba
–arXiv.org Artificial Intelligence
In this paper, we introduce a multi-label lazy learning approach to deal with automatic semantic indexing in large document collections in the presence of complex and structured label vocabularies with high inter-label correlation. The proposed method is an evolution of the traditional k-Nearest Neighbors algorithm which uses a large autoencoder trained to map the large label space to a reduced size latent space and to regenerate the predicted labels from this latent space. We have evaluated our proposal in a large portion of the MEDLINE biomedical document collection which uses the Medical Subject Headings (MeSH) thesaurus as a controlled vocabulary. In our experiments we propose and evaluate several document representation approaches and different label autoencoder configurations.
arXiv.org Artificial Intelligence
Feb-2-2024
- Country:
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Asia > China
- Hong Kong (0.04)
- Europe
- North America > United States
- California > San Francisco County
- San Francisco (0.14)
- Minnesota > Hennepin County
- Minneapolis (0.28)
- New York > New York County
- New York City (0.04)
- California > San Francisco County
- Oceania > New Zealand
- North Island > Auckland Region > Auckland (0.04)
- South America > Brazil
- Rio de Janeiro > Rio de Janeiro (0.04)
- Africa > Ethiopia
- Genre:
- Research Report > New Finding (0.87)
- Industry:
- Health & Medicine (1.00)
- Technology: