RephQA: Evaluating Readability of Large Language Models in Public Health Question Answering

Qiu, Weikang, Huang, Tinglin, Rullo, Ryan, Kuang, Yucheng, Maatouk, Ali, Ramos, S. Raquel, Ying, Rex

Oct-6-2025–arXiv.org Artificial Intelligence

Large Language Models (LLMs) hold promise in addressing complex medical problems. However, while most prior studies focus on improving accuracy and reasoning abilities, a significant bottleneck in developing effective healthcare agents lies in the readability of LLM-generated responses, specifically, their ability to answer public health problems clearly and simply to people without medical backgrounds. In this work, we introduce RephQA, a benchmark for evaluating the readability of LLMs in public health question answering (QA). It contains 533 expert-reviewed QA pairs from 27 sources across 13 topics, and includes a proxy multiple-choice task to assess informativeness, along with two readability metrics: Flesch-Kincaid grade level and professional score. Evaluation of 25 LLMs reveals that most fail to meet readability standards, highlighting a gap between reasoning and effective communication. To address this, we explore four readability-enhancing strategies-standard prompting, chain-of-thought prompting, Group Relative Policy Optimization (GRPO), and a token-adapted variant. Token-adapted GRPO achieves the best results, advancing the development of more practical and user-friendly public health agents. These results represent a step toward building more practical agents for public health.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Oct-6-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.67)
- Asia > Middle East (0.46)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine
  - Public Health (1.00)
  - Diagnostic Medicine (1.00)
  - Consumer Health (1.00)
  - Therapeutic Area
    - Oncology (1.00)
    - Cardiology/Vascular Diseases (1.00)
    - Immunology (0.94)
    - Endocrinology (0.69)
    - Psychiatry/Psychology (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found