Artificial Intelligence health advice accuracy varies across languages and contexts

Apr-28-2025–arXiv.org Artificial Intelligence

Using basic health statements authorized by UK and EU registers and ~9,100 journalist - vetted public - health assertions on topics such as abortion, COVID - 19 and politics from sources ranging from peer - reviewed journals and government advisories to social med ia and news across the political spectrum, we benchmark six leading large language models from in 21 languages, finding that -- despite high accuracy on English - centric textbook claims -- performance falls in multiple non - European languages and fluctuates by top ic and source, highlighting the urgency of comprehensive multilingual, domain - aware validation before deploying AI in global health communication. Main Text: Recent evidence suggests that 17 % of U.S. adults -- and a striking 25 % of those aged 18 - 29 -- now consult AI chatbots for health questions at least once a month (1), while in Australia nearly 10 % of adults did so in just the first half of 2024 (2). Beyond mere curiosity, these tools can substantially improve comprehension: running standard d ischarge notes through GPT - 4 reduced the average reading grade level from 11th to 6th and boosted patient - understandability scores from 13 % to 81 % (3). Yet as fluently as large language models (LLMs) can rephrase medical text, they lack formal clinical v etting and still rely on statistical patterns in their training data. When generative AI echoes unverified or dangerous claims, it risks amplifying harm.

accuracy, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Apr-28-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - China (0.05)
  - India (0.05)
- Europe
  - Belarus (0.15)
  - Germany > North Rhine-Westphalia
    - Cologne Region > Bonn (0.05)
  - United Kingdom > England
    - Greater London > London (0.05)
    - West Midlands > Coventry (0.05)
- North America > United States (0.14)
- Oceania > Australia (0.25)

Genre:
- Research Report (0.40)

Industry:
- Health & Medicine > Therapeutic Area
  - Immunology (1.00)
  - Infections and Infectious Diseases (0.92)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.49)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)