Clustered Hierarchical Anomaly and Outlier Detection Algorithms
Ishaq, Najib, Howard, Thomas J. III, Daniels, Noah M.
Anomaly and outlier detection in datasets is a long-standing problem in machine learning. In some cases, anomaly detection is easy, such as when data are drawn from well-characterized distributions such as the Gaussian. However, when data occupy high-dimensional spaces, anomaly detection becomes more difficult. We present CLAM (Clustered Learning of Approximate Manifolds), a fast hierarchical clustering technique that learns a manifold in a Banach space defined by a distance metric. CLAM induces a graph from the cluster tree, based on overlapping clusters determined by several geometric and topological features. On these graphs, we implement CHAODA (Clustered Hierarchical Anomaly and Outlier Detection Algorithms), exploring various properties of the graphs and their constituent clusters to compute scores of anomalousness. On 24 publicly available datasets, we compare the performance of CHAODA (by measure of ROC AUC) to a variety of state-of-the-art unsupervised anomaly-detection algorithms. Six of the datasets are used for training. CHAODA outperforms other approaches on 14 of the remaining 18 datasets.
Feb-9-2021
- Country:
- Europe > Italy (0.04)
- Asia (0.04)
- South America > Chile
- North America
- United States
- Rhode Island (0.04)
- Wisconsin (0.04)
- New York > New York County
- New York City (0.04)
- Canada > Newfoundland and Labrador
- Labrador (0.04)
- United States
- Genre:
- Research Report (0.81)
- Industry:
- Health & Medicine > Therapeutic Area
- Oncology (0.68)
- Obstetrics/Gynecology (0.46)
- Health & Medicine > Therapeutic Area
- Technology: