Ontology- and LLM-based Data Harmonization for Federated Learning in Healthcare

Kokash, Natallia, Wang, Lei, Gillespie, Thomas H., Belloum, Adam, Grosso, Paola, Quinney, Sara, Li, Lang, de Bono, Bernard

arXiv.org Artificial Intelligence 

The rise of electronic health records (EHRs) has unlocked new opportunities for medical research, but privacy regulations and data heterogeneity remain key barriers to large-scale machine learning. Federated learning (FL) enables collaborative modeling without sharing raw data, yet faces challenges in harmonizing diverse clinical datasets. This paper presents a two-step data alignment strategy integrating ontologies and large language models (LLMs) to support secure, privacy-preserving FL in healthcare, demonstrating its effectiveness in a real-world project involving semantic mapping of EHR data.