MedRep: Medical Concept Representation for General Electronic Health Record Foundation Models
Kim, Junmo, Lee, Namkyeong, Kim, Jiwon, Kim, Kwangsoo
–arXiv.org Artificial Intelligence
Electronic health record (EHR) foundation models have been an area ripe for exploration with their improved performance in various medical tasks. Despite the rapid advances, there exists a fundamental limitation: Processing unseen medical codes out of vocabulary. This problem limits the generalizability of EHR foundation models and the integration of models trained with different vocabularies. To alleviate this problem, we propose a set of novel medical concept representations (MedRep) for EHR foundation models based on the observational medical outcome partnership (OMOP) common data model (CDM). For concept representation learning, we enrich the information of each concept with a minimal definition through large language model (LLM) prompts and complement the text-based representations through the graph ontology of OMOP vocabulary. Our approach outperforms the vanilla EHR foundation model and the model with a previously introduced medical code tokenizer in diverse prediction tasks. We also demonstrate the generalizability of MedRep through external validation.
arXiv.org Artificial Intelligence
Aug-15-2025
- Country:
- Asia
- China (0.04)
- South Korea > Seoul
- Seoul (0.04)
- North America > United States
- Maryland > Montgomery County
- Rockville (0.04)
- Nebraska (0.04)
- Maryland > Montgomery County
- Asia
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Technology: