On Adaptive Distance Estimation

Neural Information Processing Systems 

We provide a static data structure for distance estimation which supports {\it adaptive} queries. Concretely, given a dataset X \{x_i\}_{i 1} n of n points in \mathbb{R} d and 0 p \leq 2, we construct a randomized data structure with low memory consumption and query time which, when later given any query point q \in \mathbb{R} d, outputs a (1 \varepsilon) -approximation of \ q - x_i\ _p with high probability for all i\in[n] . The main novelty is our data structure's correctness guarantee holds even when the sequence of queries can be chosen adaptively: an adversary is allowed to choose the j th query point q_j in a way that depends on the answers reported by the data structure for q_1,\ldots,q_{j-1} . Previous randomized Monte Carlo methods do not provide error guarantees in the setting of adaptively chosen queries. Our memory consumption is \tilde O(nd/\varepsilon 2), slightly more than the O(nd) required to store X in memory explicitly, but with the benefit that our time to answer queries is only \tilde O(\varepsilon {-2}(n d)), much faster than the naive \Theta(nd) time obtained from a linear scan in the case of n and d very large.