A Universal Metric of Dataset Similarity for Cross-silo Federated Learning

Apr-29-2024–arXiv.org Artificial Intelligence

Federated Learning is increasingly used in domains such as healthcare to facilitate collaborative model training without data-sharing. However, datasets located in different sites are often non-identically distributed, leading to degradation of model performance in FL. Most existing methods for assessing these distribution shifts are limited by being dataset or task-specific. Moreover, these metrics can only be calculated by exchanging data, a practice restricted in many FL scenarios. To address these challenges, we propose a novel metric for assessing dataset similarity. Our metric exhibits several desirable properties for FL: it is dataset-agnostic, is calculated in a privacy-preserving manner, and is computationally efficient, requiring no model training. In this paper, we first establish a theoretical connection between our metric and training dynamics in FL. Next, we extensively evaluate our metric on a range of datasets including synthetic, benchmark, and medical imaging datasets. We demonstrate that our metric shows a robust and interpretable relationship with model performance and can be calculated in privacy-preserving manner. As the first federated dataset similarity metric, we believe this metric can better facilitate successful collaborations between sites.

arXiv.org Artificial Intelligence

Apr-29-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia (0.04)
- North America > United States
  - New York (0.04)
  - Virginia (0.04)
  - California (0.04)
- Europe
  - Austria > Vienna (0.05)
  - Switzerland > Zürich
    - Zürich (0.14)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (0.67)

Industry:
- Information Technology > Security & Privacy (1.00)
- Health & Medicine
  - Health Care Technology (1.00)
  - Diagnostic Medicine > Imaging (0.34)

Technology:
- Information Technology
  - Data Science (1.00)
  - Artificial Intelligence > Machine Learning
    - Neural Networks (0.94)
    - Statistical Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found