Toward Metric Indexes for Incremental Insertion and Querying
Raff, Edward, Nicholas, Charles
In this work we explore the use of metric index structures, which accelerate nearest neighbor queries, in the scenario where we need to interleave insertions and queries during deployment. This use-case is inspired by a real-life need in malware analysis triage, and is surprisingly understudied. Existing literature tends to either focus on only final query efficiency, often does not support incremental insertion, or does not support arbitrary distance metrics. We modify and improve three algorithms to support our scenario of incremental insertion and querying with arbitrary metrics, and evaluate them on multiple datasets and distance metrics while varying the value of $k$ for the desired number of nearest neighbors. In doing so we determine that our improved Vantage-Point tree of Minimum-Variance performs best for this scenario.
Jan-12-2018
- Country:
- Europe
- Poland > Masovia Province
- Warsaw (0.04)
- Switzerland > Geneva
- Geneva (0.04)
- Poland > Masovia Province
- North America > United States
- California
- San Francisco County > San Francisco (0.14)
- Santa Clara County > San Jose (0.04)
- District of Columbia > Washington (0.14)
- Georgia > Fulton County
- Atlanta (0.04)
- Maryland
- Baltimore (0.04)
- Baltimore County (0.04)
- Charles County (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- New Jersey (0.04)
- New York > New York County
- New York City (0.14)
- California
- Europe
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: