Towards Semantic Noise Cleansing of Categorical Data based on Semantic Infusion
–arXiv.org Artificial Intelligence
Semantic Noise affects text analytics activities for the domain-specific industries significantly. It impedes the text understanding which holds prime importance in the critical decision making tasks. In this work, we formalize semantic noise as a sequence of terms that do not contribute to the narrative of the text. We look beyond the notion of standard statistically-based stop words and consider the semantics of terms to exclude the semantic noise. We present a novel Semantic Infusion technique to associate meta-data with the categorical corpus text and demonstrate its near-lossless nature. Based on this technique, we propose an unsupervised text-preprocessing framework to filter the semantic noise using the context of the terms. Later we present the evaluation results of the proposed framework using a web forum dataset from the automobile-domain.
arXiv.org Artificial Intelligence
Feb-6-2020
- Country:
- North America > United States
- Colorado (0.04)
- Asia > India
- North America > United States
- Genre:
- Research Report (0.50)
- Industry:
- Automobiles & Trucks (0.50)
- Transportation > Ground
- Road (0.52)
- Technology: