Domain Specific Data Distillation and Multi-modal Embedding Generation
Peddiraju, Sharadind, Rajagopal, Srini
–arXiv.org Artificial Intelligence
The challenge of creating domain-centric embeddings arises from the abundance of unstructured data and the scarcity of domain-specific structured data. Conventional embedding techniques often rely on either modality, limiting their applicability and efficacy. This paper introduces a novel modeling approach that leverages structured data to filter noise from unstructured data, resulting in embeddings with high precision and recall for domain-specific attribute prediction. The proposed model operates within a Hybrid Collaborative Filtering (HCF) framework, where generic entity representations are fine-tuned through relevant item prediction tasks. Our experiments, focusing on the cloud computing domain, demonstrate that HCF-based embeddings outperform AutoEncoder-based embeddings (using purely unstructured data), achieving a 28% lift in precision and an 11% lift in recall for domain-specific attribute prediction.
arXiv.org Artificial Intelligence
Oct-26-2024
- Country:
- Europe > Bulgaria > Varna Province > Varna (0.04)
- Genre:
- Research Report (0.64)
- Industry:
- Information Technology (0.93)
- Leisure & Entertainment > Games
- Computer Games (0.46)
- Technology:
- Information Technology
- Information Management (1.00)
- Data Science > Data Mining (0.94)
- Artificial Intelligence
- Representation & Reasoning (1.00)
- Natural Language (1.00)
- Machine Learning
- Performance Analysis > Accuracy (0.48)
- Neural Networks > Deep Learning (0.47)
- Information Technology