Randomized Near Neighbor Graphs, Giant Components, and Applications in Data Science

Linderman, George C., Mishne, Gal, Kluger, Yuval, Steinerberger, Stefan

Nov-13-2017–arXiv.org Machine Learning

If we pick $n$ random points uniformly in $[0,1]^d$ and connect each point to its $k-$nearest neighbors, then it is well known that there exists a giant connected component with high probability. We prove that in $[0,1]^d$ it suffices to connect every point to $ c_{d,1} \log{\log{n}}$ points chosen randomly among its $ c_{d,2} \log{n}-$nearest neighbors to ensure a giant component of size $n - o(n)$ with high probability. This construction yields a much sparser random graph with $\sim n \log\log{n}$ instead of $\sim n \log{n}$ edges that has comparable connectivity properties. This result has nontrivial implications for problems in data science where an affinity matrix is constructed: instead of picking the $k-$nearest neighbors, one can often pick $k' \ll k$ random points out of the $k-$nearest neighbors without sacrificing efficiency. This can massively simplify and accelerate computation, we illustrate this with several numerical examples.

artificial intelligence, machine learning, nearest neighbor, (18 more...)

arXiv.org Machine Learning

Nov-13-2017

arXiv.org PDF

Add feedback

Country:
- North America > United States > Texas (0.28)

Genre:
- Research Report (0.50)
- Workflow (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Statistical Learning
    - Nearest Neighbor Methods (0.37)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found