Large Language Models versus Classical Machine Learning: Performance in COVID-19 Mortality Prediction Using High-Dimensional Tabular Data

Ghaffarzadeh-Esfahani, Mohammadreza, Ghaffarzadeh-Esfahani, Mahdi, Salahi-Niri, Arian, Toreyhi, Hossein, Atf, Zahra, Mohsenzadeh-Kermani, Amirali, Sarikhani, Mahshad, Tajabadi, Zohreh, Shojaeian, Fatemeh, Bagheri, Mohammad Hassan, Feyzi, Aydin, Tarighatpayma, Mohammadamin, Gazmeh, Narges, Heydari, Fateme, Afshar, Hossein, Allahgholipour, Amirreza, Alimardani, Farid, Salehi, Ameneh, Asadimanesh, Naghmeh, Khalafi, Mohammad Amin, Shabanipour, Hadis, Moradi, Ali, Zadeh, Sajjad Hossein, Yazdani, Omid, Esbati, Romina, Maleki, Moozhan, Nasr, Danial Samiei, Soheili, Amirali, Majlesi, Hossein, Shahsavan, Saba, Soheilipour, Alireza, Goudarzi, Nooshin, Taherifard, Erfan, Hatamabadi, Hamidreza, Samaan, Jamil S, Savage, Thomas, Sakhuja, Ankit, Soroush, Ali, Nadkarni, Girish, Darazam, Ilad Alavi, Pourhoseingholi, Mohamad Amin, Safavi-Naini, Seyed Amir Ahmad

Sep-2-2024–arXiv.org Artificial Intelligence

Background: This study aimed to evaluate and compare the performance of classical machine learning models (CMLs) and large language models (LLMs) in predicting mortality associated with COVID-19 by utilizing a high-dimensional tabular dataset. Materials and Methods: We analyzed data from 9,134 COVID-19 patients collected across four hospitals. Seven CML models, including XGBoost and random forest (RF), were trained and evaluated. The structured data was converted into text for zero-shot classification by eight LLMs, including GPT-4 and Mistral-7b. Additionally, Mistral-7b was fine-tuned using the QLoRA approach to enhance its predictive capabilities. Results: Among the CML models, XGBoost and RF achieved the highest accuracy, with F1 scores of 0.87 for internal validation and 0.83 for external validation. In the LLM category, GPT-4 was the top performer with an F1 score of 0.43. Fine-tuning Mistral-7b significantly improved its recall from 1% to 79%, resulting in an F1 score of 0.74, which was stable during external validation. Conclusion: While LLMs show moderate performance in zero-shot classification, fine-tuning can significantly enhance their effectiveness, potentially aligning them closer to CML models. However, CMLs still outperform LLMs in high-dimensional tabular data tasks.

investigation, medical science, university, (16 more...)

arXiv.org Artificial Intelligence

Sep-2-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Maryland > Baltimore (0.04)
    - New York > New York County
      - New York City (0.04)
    - California
      - Los Angeles County > Los Angeles (0.28)
      - Santa Clara County
        Stanford (0.04)
        Palo Alto (0.04)
  - Canada > Ontario
    - Durham Region > Oshawa (0.04)
- Europe
  - United Kingdom > England
    - Nottinghamshire > Nottingham (0.14)
  - Finland > Uusimaa
    - Helsinki (0.04)
- Asia > Middle East
  - Iran
    - Tehran Province > Tehran (0.08)
    - Isfahan Province > Isfahan (0.04)
    - Fars Province > Shiraz (0.04)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Health & Medicine
  - Health Care Technology > Medical Record (0.93)
  - Therapeutic Area
    - Infections and Infectious Diseases (1.00)
    - Immunology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Performance Analysis > Accuracy (1.00)
    - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found