Clustering to Reduce Spatial Data Set Size

Boeing, Geoff

arXiv.org Machine Learning 

Traditionally it had been a problem that researchers did not have access to enough spatial data to answer pressing research questions or build compelling visualizations. Today, however, the problem is often that we have too much data. Spatially redundant or approximately redundant points may refer to a single feature (plus noise) rather than many distinct spatial features. We can use density-based clustering to compress such spatial data into a set of representative features. This paper demonstrates how to reduce the size of a spatial data set of GPS latitude-longitude coordinates using the Python programming language and its scikitlearn implementation of the DBSCAN density-based clustering algorithm. DBSCAN works very well in low-dimension space, such as the two-dimensional feature space in this geospatial example.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found