Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms

Aug-27-2019, 05:13:11 GMT–#artificialintelligence

Given a data set D containing millions of data points and a data consumer who is willing to pay for X to train a machine learning (ML) model over D, how should we distribute this X to each data point to reflect its "value"? In this paper, we define the "relative value of data" via the Shapley value, as it uniquely possesses properties with appealing real-world interpretations, such as fairness, rationality and decentralizability. For general, bounded utility functions, the Shapley value is known to be challenging to compute: to get Shapley values for all N data points, it requires O(2 N) model evaluations for exact computation and O(N N) for (ϵ, δ)-approximation. In this paper, we focus on one popular family of ML models relying on K-nearest neighbors (KNN). The most surprising result is that for unweighted KNN classifiers and regressors, the Shapley value of all N data points can be computed, exactly, in O(N N) time -- an exponential improvement on computational complexity!

algorithm, artificial intelligence, machine learning, (5 more...)

#artificialintelligence

Aug-27-2019, 05:13:11 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning
    - Nearest Neighbor Methods (0.73)
  - Representation & Reasoning > Case-Based Reasoning (0.63)

Duplicate Docs Excel Report

Title
Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms

Similar Docs Excel Report more

Title	Similarity	Source
None found