Variance reduced Shapley value estimation for trustworthy data valuation
Wu, Mengmeng, Jia, Ruoxi, Lin, Changle, Huang, Wei, Chang, Xiangyu
–arXiv.org Artificial Intelligence
The emerging big data in all walks of life has become the driving force of technological and economic development (Ghorbani and Zou, 2019; Huang et al., 2021). Various sectors such as finance and healthcare increasingly rely on individuals' data for predictions, decision-making, and generating business value, which promotes extensive data transactions (Barua et al., 2012). One of the most critical problems in data trading scenarios is data valuation. We consider data trading scenarios in data markets based on machine learning models, such as DATABRIGHT (Dao et al., 2018) and Sterling (Hynes et al., 2018). The data value in this scenario is largely determined by its contribution to a specific machine learning model. We focus on data valuation in supervised learning, which is one of the main pillars of machine learning. The core challenge is how to fairly evaluate the contribution of each data in the training set to the learning algorithm for a particular performance metric. A natural way to handle the aforementioned issue is to treat each data as a player in a cooperative game. Then, the value of each player can be assessed through utility functions from a game-theoretic perspective (Jia et al., 2019b).
arXiv.org Artificial Intelligence
May-22-2023
- Country:
- Europe > France (0.04)
- North America > United States
- Virginia (0.04)
- Asia > China
- Shaanxi Province > Xi'an (0.04)
- Genre:
- Research Report
- New Finding (0.46)
- Experimental Study (0.46)
- Research Report
- Industry:
- Health & Medicine (0.89)
- Banking & Finance > Trading (0.54)
- Technology: