Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models
Nagori, Aditya, Gautam, Ayush, Wiens, Matthew O., Nguyen, Vuong, Mugisha, Nathan Kenya, Kabakyenga, Jerome, Kissoon, Niranjan, Ansermino, John Mark, Kamaleswaran, Rishikesan
–arXiv.org Artificial Intelligence
Clustering patient subgroups is essential for personalized care and efficient resource use. Traditional clustering methods struggle with high-dimensional, heterogeneous healthcare data and lack contextual understanding. This study evaluates Large Language Model (LLM) based clustering against classical methods using a pediatric sepsis dataset from a low-income country (LIC), containing 2,686 records with 28 numerical and 119 categorical variables. Patient records were serialized into text with and without a clustering objective. Embeddings were generated using quantized LLAMA 3.1 8B, DeepSeek-R1-Distill-Llama-8B with low-rank adaptation(LoRA), and Stella-En-400M-V5 models. K-means clustering was applied to these embeddings. Classical comparisons included K-Medoids clustering on UMAP and FAMD-reduced mixed data. Silhouette scores and statistical tests evaluated cluster quality and distinctiveness. Stella-En-400M-V5 achieved the highest Silhouette Score (0.86). LLAMA 3.1 8B with the clustering objective performed better with higher number of clusters, identifying subgroups with distinct nutritional, clinical, and socioeconomic profiles. LLM-based methods outperformed classical techniques by capturing richer context and prioritizing key features. These results highlight potential of LLMs for contextual phenotyping and informed decision-making in resource-limited settings.
arXiv.org Artificial Intelligence
Aug-5-2025
- Country:
- Africa
- Kenya (0.04)
- Uganda
- Central Region > Kampala (0.04)
- Western Region > Mbarara District (0.05)
- Asia > India
- Goa (0.04)
- Europe
- Ireland (0.04)
- United Kingdom (0.04)
- North America
- Canada > British Columbia
- Vancouver (0.05)
- United States
- Florida > Palm Beach County
- Boca Raton (0.04)
- New Jersey > Mercer County
- Princeton (0.04)
- North Carolina > Durham County
- Durham (0.04)
- Florida > Palm Beach County
- Canada > British Columbia
- South America > Bolivia (0.04)
- Africa
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Technology: