Privacy Preserving Identification Using Sparse Approximation with Ambiguization
Razeghi, Behrooz, Voloshynovskiy, Slava, Kostadinov, Dimche, Taran, Olga
A. Identification and ANN Search Many modern applications such as biometrics, digital physical object security and data generated by connected objects in the IoT require privacy preserving identification of a query with respect to a given dataset. Practically, the identification problem is based on an ANN search when a list of indices corresponding to the NN items is returned. At the final refinement stage, the list can be refined in a private setting and a single index is declared as the identified one. The identification problem faces the curse of dimensionality. For this reason, the exact identification is replaced by a search of list of closest items, i.e., one tries to tradeoff the accuracy of identification by the search complexity. In recent years, many methods providing efficient ANN solutions for multi-billion entry datasets were proposed and we named some of them without pretending to be exhaustive in our overview [1]-[3]. B. Search in Privacy Preserving Settings: Main Considerations Due to the massive amount of data, modern distributed storage and computing facilities, many ANN problems are considered in a setting where the data user outsources his datasets by applying the corresponding protection measures to third parties (servers) possessing powerful storage, communications and computing facilities. The need for data protection comes from many perspectives related to the cost of data collection, data as a "product" that represents a great value in the era of machine learning, which can be used to train and prune new and existing machine learning tools. Moreover, the server might want to discover some hidden relationships in the data.
Sep-29-2017