Stratified cross-validation for unbiased and privacy-preserving federated learning
Bey, R., Goussault, R., Benchoufi, M., Porcher, R.
Large-scale collections of electronic records constitute both an opportunity for the development of more accurate prediction models and a threat for privacy. To limit privacy exposure new privacy-enhancing techniques are emerging such as federated learning which enables large-scale data analysis while avoiding the centralization of records in a unique database that would represent a critical point of failure. Although promising regarding privacy protection, federated learning prevents using some data-cleaning algorithms thus inducing new biases. In this work we focus on the recurrent problem of duplicated records that, if not handled properly, may cause over-optimistic estimations of a model's performances. We introduce and discuss stratified cross-validation, a validation methodology that leverages stratification techniques to prevent data leakage in federated learning settings without relying on demanding deduplication algorithms.
Jan-23-2020
- Country:
- North America > United States
- Nevada (0.04)
- Texas > Dallas County
- Dallas (0.04)
- Colorado > Denver County
- Denver (0.04)
- Europe
- Norway > Central Norway
- France
- Île-de-France > Paris
- Paris (0.14)
- Pays de la Loire > Loire-Atlantique
- Nantes (0.04)
- Île-de-France > Paris
- Asia
- Nepal (0.04)
- Middle East > Jordan (0.04)
- North America > United States
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Technology: