Outlier Detection in High Dimensional Data

Kamalov, Firuz, Leung, Ho Hon

arXiv.org Artificial Intelligence 

Most of the existing algorithms fail to properly address the issues stemming from a large number of features. In particular, outlier detection algorithms perform poorly on data set of small size with a large number of features. In this paper, we propose a novel outlier detection algorithm based on principal component analysis and kernel density estimation. The proposed method is designed to address the challenges of dealing with high-dimensional data by projecting the original data onto a smaller space and using the innate structure of the data to calculate anomaly scores for each data point. Numerical experiments on synthetic and real-life data show that our method performs well on high-dimensional data. Our method also produces better-than-average execution times compared to the benchmark methods. Despite the growing amount of data that has become available for research and discovery there remain areas where certain type of data is scarce. In the fields such as medical diagnostics, network intrusion detection, fraudulent financial transactions and many others, deviations from normal behavior, i.e. anomalies, are rare. However, often, these events are of the greatest importance. For example, it would be extremely beneficial to determine if a person has an illness based on abnormal lab results.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found