A Systematic Evaluation of LLM Strategies for Mental Health Text Analysis: Fine-tuning vs. Prompt Engineering vs. RAG

Kermani, Arshia, Perez-Rosas, Veronica, Metsis, Vangelis

Mar-31-2025–arXiv.org Artificial Intelligence

This study presents a systematic comparison of three approaches for the analysis of mental health text using large language models (LLMs): prompt engineering, retrieval augmented generation (RAG), and fine-tuning. Using LLaMA 3, we evaluate these approaches on emotion classification and mental health condition detection tasks across two datasets. Fine-tuning achieves the highest accuracy (91% for emotion classification, 80% for mental health conditions) but requires substantial computational resources and large training sets, while prompt engineering and RAG offer more flexible deployment with moderate performance (40-68% accuracy). Our findings provide practical insights for implementing LLM-based solutions in mental health applications, highlighting the trade-offs between accuracy, computational requirements, and deployment flexibility.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Mar-31-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Belgium
  - Brussels-Capital Region > Brussels (0.04)
- North America > United States
  - Texas > Hays County > San Marcos (0.04)

Genre:
- Research Report > New Finding (0.48)

Industry:
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found