Goto

Collaborating Authors

 Jaiswal, Shikhar


OOD-DiskANN: Efficient and Scalable Graph ANNS for Out-of-Distribution Queries

arXiv.org Artificial Intelligence

Since solving State-of-the-art algorithms for Approximate Nearest Neighbor Search the problem exactly requires an expensive exhaustive scan of the (ANNS) such as DiskANN, FAISS-IVF, and HNSW build data dependent database - which would be impractical for real-world indices that indices that offer substantially better accuracy and search span billions of objects - practical interactive search systems use efficiency over data-agnostic indices by overfitting to the index Approximate Nearest Neighbor Search (ANNS) algorithms with data distribution. When the query data is drawn from a different highly sub-linear query complexity [10, 18, 24, 30] to answer such distribution - e.g., when index represents image embeddings and queries. The quality of such ANN indices is often measured by query represents textual embeddings - such algorithms lose much k-recall@k which is the overlap between the top-results of the of this performance advantage. On a variety of datasets, for a fixed index search with the ground truth -nearest neighbors (-NNs) in recall target, latency is worse by an order of magnitude or more for the corpus for the query, averaged over a representative query set. Out-Of-Distribution (OOD) queries as compared to In-Distribution State-of-the-art algorithms for ANNS, such as graph-based indices (ID) queries. The question we address in this work is whether ANNS [16, 24, 30] which use data-dependent index construction, algorithms can be made efficient for OOD queries if the index construction achieve better query efficiency over prior data-agnostic methods is given access to a small sample set of these queries. We like LSH [6, 18] (see Section A.1 for more details). Such efficiency answer positively by presenting OOD-DiskANN, which uses a sparing enables these indices to serve queries with > 90% recall with a sample (1% of index set size) of OOD queries, and provides up to latency of a few milliseconds, required in interactive web scenarios.


MinUn: Accurate ML Inference on Microcontrollers

arXiv.org Artificial Intelligence

Running machine learning inference on tiny devices, known as TinyML, is an emerging research area. This task requires generating inference code that uses memory frugally, a task that standard ML frameworks are ill-suited for. A deployment framework for TinyML must be a) parametric in the number representation to take advantage of the emerging representations like posits, b) carefully assign high-precision to a few tensors so that most tensors can be kept in low-precision while still maintaining model accuracy, and c) avoid memory fragmentation. We describe MinUn, the first TinyML framework that holistically addresses these issues to generate efficient code for ARM microcontrollers (e.g., Arduino Uno, Due and STM32H747) that outperforms the prior TinyML frameworks.