Using Z-values to efficiently compute k-nearest neighbors for Apache Flink – Insight Data
In an earlier post, I described work that I had initially done as an Insight Data Engineering Fellow. That work, now merged into Flink's master branch, was to do an efficient exact k-nearest neighbors (KNN) query using quadtrees. I have since worked on an approximate version of the KNN algorithm, and I will discuss one method I used for the approximate version using Z-value based hashing. For large and high dimensional data sets, an exact k-nearest neighbors query can become infeasible. There are many algorithms that reduce the dimensionality of the points by hashing them to lower dimensions.
Sep-5-2016, 06:35:31 GMT