Optimal Transport-based Domain Alignment as a Preprocessing Step for Federated Learning

Pereira, Luiz Manella, Amini, M. Hadi

arXiv.org Artificial Intelligence 

It offers a compelling framework for scenarios in which data cannot be centrally aggregated due to privacy constraints, thereby promoting compliance with data protection regulations and enhancing scalability [1]. Beyond its foundational role in privacy-preserving learning, FL also facilitates model personalization--adapting learning outcomes to individual users across the network--an increasingly relevant objective given the heterogeneity of user behavior and datasets. A comprehensive overview of the challenges and practical implementations of personalized federated learning is presented in [2]. Despite its broad applicability, particularly in contexts with stringent data privacy constraints, FL introduces a set of constraints that must be carefully addressed to ensure robust and efficient model training. These constraints include limited communication bandwidth, restricted computation at edge devices, privacy preservation requirements, and data heterogeneity and imbalance. Dataset imbalance in FL emerges when edge devices possess non-uniform class distributions, disparate dataset sizes, or varying data quality [3, 4]. In this work, we propose a preprocessing framework that addresses this imbalance challenge in a model-and algorithm-agnostic manner. Our method aligns and transforms local datasets into a shared representation space that captures statistical information from all participating agents in the network.