Outlier Detection in High Dimensional Data

Sep-9-2019–arXiv.org Artificial Intelligence

Most of the existing algorithms fail to properly address the issues stemming from a large number of features. In particular, outlier detection algorithms perform poorly on data set of small size with a large number of features. In this paper, we propose a novel outlier detection algorithm based on principal component analysis and kernel density estimation. The proposed method is designed to address the challenges of dealing with high-dimensional data by projecting the original data onto a smaller space and using the innate structure of the data to calculate anomaly scores for each data point. Numerical experiments on synthetic and real-life data show that our method performs well on high-dimensional data. Our method also produces better-than-average execution times compared to the benchmark methods. Despite the growing amount of data that has become available for research and discovery there remain areas where certain type of data is scarce. In the fields such as medical diagnostics, network intrusion detection, fraudulent financial transactions and many others, deviations from normal behavior, i.e. anomalies, are rare. However, often, these events are of the greatest importance. For example, it would be extremely beneficial to determine if a person has an illness based on abnormal lab results.

artificial intelligence, data mining, machine learning, (13 more...)

arXiv.org Artificial Intelligence

Sep-9-2019

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East > UAE (0.14)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine (0.48)
- Banking & Finance (0.48)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Anomaly Detection (1.00)
  - Artificial Intelligence > Machine Learning
    - Statistical Learning (1.00)
    - Performance Analysis > Accuracy (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found