Stratified cross-validation for unbiased and privacy-preserving federated learning

Bey, R., Goussault, R., Benchoufi, M., Porcher, R.

Jan-23-2020–arXiv.org Machine Learning

Large-scale collections of electronic records constitute both an opportunity for the development of more accurate prediction models and a threat for privacy. To limit privacy exposure new privacy-enhancing techniques are emerging such as federated learning which enables large-scale data analysis while avoiding the centralization of records in a unique database that would represent a critical point of failure. Although promising regarding privacy protection, federated learning prevents using some data-cleaning algorithms thus inducing new biases. In this work we focus on the recurrent problem of duplicated records that, if not handled properly, may cause over-optimistic estimations of a model's performances. We introduce and discuss stratified cross-validation, a validation methodology that leverages stratification techniques to prevent data leakage in federated learning settings without relying on demanding deduplication algorithms.

covariate, federated learning, stratified cross-validation, (15 more...)

arXiv.org Machine Learning

Jan-23-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Nevada (0.04)
  - Texas > Dallas County
    - Dallas (0.04)
  - Colorado > Denver County
    - Denver (0.04)
- Europe
  - Norway > Central Norway
    - Trøndelag > Trondheim (0.04)
  - France
    - Île-de-France > Paris
      - Paris (0.14)
    - Pays de la Loire > Loire-Atlantique
      - Nantes (0.04)
- Asia
  - Nepal (0.04)
  - Middle East > Jordan (0.04)

Genre:
- Research Report (1.00)

Industry:
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.53)
  - Artificial Intelligence > Machine Learning
    - Performance Analysis > Cross Validation (0.67)
    - Ensemble Learning (0.47)
    - Statistical Learning (0.46)
    - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found